The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 3

Scenario 3: Loss of OCRmirror from the OCR MASTER only

This is a followup of chapter 2.

As we have seen in scenario 1, the OCR MASTER will update the votecount. Now let’s hide the ocrmirror from only 1 node: the node being the OCR MASTER, while the other node continues to see the ocrmirror. Will CRS get confused about this?

Note: while doing this test, crs is running on both nodes.

In this scenario, node 2 is the OCR MASTER. In fact I haven’t found any command to query who is the master. The only way to find out is to compare the crsd logfiles on all nodes to find the most recent message “I AM THE NEW OCR MASTER”. If anyone knows a better way for determining this, please let me know.

So when hiding the ocrmirror from node 2, we see in its alert file, as expected:

2008-07-23 09:14:53.921
[crsd(8215)]CRS-1006:The OCR location /dev/oracle/ocrmirror is inaccessible. Details in /app/oracle/crs/log/nodeb01/crsd/crsd.log.

and in its logfile:

2008-07-23 09:14:53.920: [ OCROSD][14]utwrite:3: problem writing the buffer 1a33000 buflen 4096 retval -1 phy_offset 143360 retry 0
2008-07-23 09:14:53.920: [ OCROSD][14]utwrite:4: problem writing the buffer errno 5 errstring I/O error
2008-07-23 09:14:53.922: [ OCRRAW][34]propriowv_bootbuf: Vote information on disk 0 [/dev/oracle/ocr] is adjusted from [1/2] to [2/2]

Nothing appears in the logfiles of the non-ocr-master, i.e. node 1. So until now this situation is still identical as in scenario 1: it is the ocr master who updates the votecount after loosing the other ocr.

The ocrcheck on node 2 (master) now gives:

         Device/File Name         : /dev/oracle/ocr
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/oracle/ocrmirror
                                    <span style="color: rgb(255, 128, 0);">Device/File unavailable</span>
         Cluster registry integrity check succeeded

But the output on node 1 (non-master) is different:

         Device/File Name         : /dev/oracle/ocr<br />                                    Device/File integrity check succeeded<br />         Device/File Name         : /dev/oracle/ocrmirror                                    <br />                                    <span style="color: rgb(255, 128, 0);">Device/File needs to be synchronized with the other device</span><br />         Cluster registry integrity check succeeded

This makes sense, because node 2 cannot see the device (device/file unavailable) while node 1 sees both devices with different vote count (2 votes for ocr and 1 vote for ocrmirror, so it asks to resync just as in scenario 1 after the ocrmirror was visible again).

So now I want to try to confuse CRS. I will try to resync the ocrmirror again from node 1. So this will update the vote count of each device to 1. Technically this is possible because node 1 can see both devices, but if it succeeds node 2 will be left with one ocr device having one vote., and we know from rule 3 that CRS cannot run in that case. Will crs then crash on node 2?…

So we do on node 1:

-bash-3.00# ocrconfig -replace ocrmirror /dev/oracle/ocrmirror

Bad luck, it fails with

PROT-21: Invalid parameter

Very clear, right… The interesting part however appears in the crsd logfile of node 2:

2008-07-23 09:19:34.712: [ OCROSD][32]utdvch:0:failed to open OCR file/disk /dev/oracle/ocrmirror, errno=5, os err string=I/O error
2008-07-23 09:19:34.712: [ OCRRAW][32]dev_replace: master could not verify the new disk (8)
[ OCRSRV][32]proas_replace_disk: Failed in changing configurations in the Master 8

So this learns us that, when the ocrconfig command is done on node 1 not being the master, that it will send this to node 2 being the master and node 2 will execute it. What NOT happens is that crs crashes, nor that node 1 takes over the mastership of node 2. Nice to know.
Very unlogical however is that, when doing the last command above, the crs alert file of node 2 shows

2008-07-23 09:19:34.711
[crsd(8215)]CRS-1007:The OCR/OCR mirror location was replaced by /dev/oracle/ocrmirror.

This is WRONG. The ocrmirror was not replaced. The message should be: “Trying to replace the OCR/OCR mirror location by /dev/oracle/ocrmirror”. It is just that you know it.
The logs on node 1 are correct. Find the latest log in the “client” directory and read:

Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2008 Oracle. All rights reserved.
2008-07-23 09:19:34.694: [ OCRCONF][1]ocrconfig starts…
2008-07-23 09:19:34.716: [ OCRCLI][1]proac_replace_dev:[/dev/oracle/ocrmirror]: Failed. Retval [8]
2008-07-23 09:19:34.716: [ OCRCONF][1]The input OCR device either is identical to the other device or cannot be opened
2008-07-23 09:19:34.716: [ OCRCONF][1]Exiting [status=failed]…

Conclusion: we cannot confuse the crs!

After making the ocrmirror visible again and reissuing the replace command on node 1, we get in the crs logfile on node 2 (master):

2008-07-23 09:27:15.384: [ OCRRAW][32]proprioo: for disk 0 (/dev/oracle/ocr), id match (1), my id set (1385758746,1866209186) total id sets (2), 1st set (1385758746,1866209186), 2nd set (1385758746,1866209186) my votes (2), total votes (2)
2008-07-23 09:27:15.384: [ OCRRAW][32]propriogid:1: INVALID FORMAT
2008-07-23 09:27:15.516: [ OCRRAW][32]propriowv_bootbuf: Vote information on disk 1 [/dev/oracle/ocrmirror] is adjusted from [0/0] to [1/2]
2008-07-23 09:27:15.517: [ OCRRAW][32]propriowv_bootbuf: Vote information on disk 0 [/dev/oracle/ocr] is adjusted from [2/2] to [1/2]
2008-07-23 09:27:15.518: [ OCRMAS][25]th_master: Deleted ver keys from cache (master)
2008-07-23 09:27:15.628: [ OCRMAS][25]th_master: Deleted ver keys from cache (master)

and the crs logfile of node 1 (non-master):

2008-07-23 09:27:15.543: [ OCRRAW][36]proprioo: for disk 0 (/dev/oracle/ocr), id match (1), my id set (1385758746,1866209186) total id sets (2), 1st set (1385758746,1866209186), 2nd set (1385758746,1866209186) my votes (1), total votes (2)
2008-07-23 09:27:15.543: [ OCRRAW][36]proprioo: for disk 1 (/dev/oracle/ocrmirror), id match (1), my id set (1385758746,1866209186) total id sets (2), 1st set (1385758746,1866209186), 2nd set (1385758746,1866209186) my votes (1), total votes (2)
2008-07-23 09:27:15.571: [ OCRMAS][25]th_master: Deleted ver keys from cache (non master)
2008-07-23 09:27:15.572: [ OCRMAS][25]th_master: Deleted ver keys from cache (non master)

and the client logfile of node 1:

Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2008 Oracle. All rights reserved.
2008-07-23 09:27:15.346: [ OCRCONF][1]ocrconfig starts…
2008-07-23 09:27:15.572: [ OCRCONF][1]Successfully replaced OCR and set block 0
2008-07-23 09:27:15.572: [ OCRCONF][1]Exiting [status=success]…

and all is ok again. Each device has one vote again, and we are back in the ‘normal’ situation.

Conclusion

We cannot confuse the CRS when ocr or ocrmirror disappears from the ocr master node only.

But what is it disappears from the non-master node…? That’s stuff for the next chapter.

About these ads

3 Responses to The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 3

  1. Freek says:

    The ocr backups are only done on the master node, so you could use ocrconfig -showbackup to now which node is the master.
    Of course, the automatic backup is only taken every 4 hours, so it could be that the master has changed (due to a failure) since the last backup.

  2. [...] is a vollow-up of chapter 3. Let’s try to do the same thing as scenario 3, however now hiding the lun from a node NOT [...]

  3. [...] Geert De Paep-Scenario 3: Loss of OCRmirror from the OCR MASTER only [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: