The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 1

Scenario 1: loss of ocrmirror, both nodes up

(This is the followup of article “Introduction“)

Facts
  • CRS is running on all nodes
  • The storage box containing the OCRmirror is made unavailable to both hosts (simulating a crash of one storage box).

What happens?

The crs alertfile ($ORA_CRS_HOME/log/hostname/alert.log) of node 1 shows:

2008-07-18 15:30:23.176
[crsd(6563)]CRS-1006:The OCR location /dev/oracle/ocrmirror is inaccessible. Details in /app/oracle/crs/log/nodea01/crsd/crsd.log.

And the CRS logfile of node 1 shows:

2008-07-18 15:30:23.176: [ OCROSD][14]utwrite:3: problem writing the buffer 1c03000 buflen 4096 retval -1 phy_offset 102400 retry 0
2008-07-18 15:30:23.176: [ OCROSD][14]utwrite:4: problem writing the buffer errno 5 errstring I/O error
2008-07-18 15:30:23.177: [ OCRRAW][768]propriowv_bootbuf: Vote information on disk 0 [/dev/oracle/ocr] is adjusted from [1/2] to [2/2]

There is nothing in the crs alertfile or crsd logfile of node 2 (allthough node 2 can’t see the lun either).
On both nodes we have:

(/app/oracle) $ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     295452
         Used space (kbytes)      :       5112
         Available space (kbytes) :     290340
         ID                       : 1930338735
         Device/File Name         : /dev/oracle/ocr
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/oracle/ocrmirror
                                    <font color="#ff9900">Device/File unavailable</font>
         Cluster registry integrity check succeeded

CRS continues to work normally on both nodes

Discussion

This test indicates that the loss of the ocrmirror leaves the cluster running normally. In other words, a crash of a storage box would allow us to continue our production normally. Great!

However I’m not easily satisfied and hence I still have a lot of questions: how to recover from this, what happens internally, can we now change/update the ocr, …? Lets’ investigate.

The most interesting things are colored in the output above. The fact that the ocrmirror device file is unavailable makes sense. Remember however the other message: vote count updated from 1 to 2.
Let’s see what happens if we now stop and start CRS on node 1 (while ocrmirror is still unavailable):
Stopping CRS on node 1 happens as usual, no error messages. However at the time of stopping CRS on node 1, we see a very interesting message in the crsd logfile of node 2:

2008-07-18 15:34:38.504: [ OCRMAS][23]th_master:13: I AM THE NEW OCR MASTER at incar 2. Node Number 2
2008-07-18 15:34:38.511: [ OCRRAW][23]proprioo: for disk 0 (/dev/oracle/ocr), id match (1), my id set (1385758746,1866209186) total id sets (1), 1st set (138575874 6,1866209186), 2nd set (0,0) my votes (2), total votes (2)
2008-07-18 15:34:38.514: [ OCROSD][23]utread:3: problem reading buffer 162e000 buflen 4096 retval -1 phy_offset 106496 retry 0
2008-07-18 15:34:38.514: [ OCROSD][23]utread:4: problem reading the buffer errno 5 errstring I/O error
2008-07-18 15:34:38.559: [ OCRMAS][23]th_master: Deleted ver keys from cache (master)

I am the new master??? So it looks as if node 1 was the master until we stopped CRS there. This makes a link to the fact that, when the lun became unavailable, that only node 1 wrote messages in its logfiles. At that time, nothing was written into the logfile of node 2, because node 2 was not the master! A very interesting concept: in a RAC cluster, one node is the the crs master and is responsible for updating the vote count in the OCR. I never read that in the doc…. Also note that the new master also identifies that the ocr has 2 votes now: “my votes (2)”.

Also, at the time of stopping CRS on node 1, the crs alert file of node 2 showed:

2008-07-18 15:34:38.446
[evmd(18282)]CRS-1006:The OCR location /dev/oracle/ocrmirror is inaccessible. Details in /app/oracle/crs/log/nodeb01/evmd/evmd.log.
2008-07-18 15:34:38.514
[crsd(18594)]CRS-1006:The OCR location /dev/oracle/ocrmirror is inaccessible. Details in /app/oracle/crs/log/nodeb01/crsd/crsd.log.
2008-07-18 15:34:38.558
[crsd(18594)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /app/oracle/crs/log/nodeb01/crsd/crsd.log.
2008-07-18 15:34:55.153

So it looks as if node 2 is checking again the availability of the ocrmirror and sees it is not available.

Now let’s start crs on node 1 again, maybe he becomes master again?… Not really. The only thing we see in the crsd logfile is:

2008-07-18 15:39:19.603: [ CLSVER][1] Active Version from OCR:10.2.0.4.0
2008-07-18 15:39:19.603: [ CLSVER][1] Active Version and Software Version are same
2008-07-18 15:39:19.603: [ CRSMAIN][1] Initializing OCR
2008-07-18 15:39:19.619: [ OCRRAW][1]proprioo: for disk 0 (/dev/oracle/ocr), id match (1), my id set (1385758746,1866209186) total id sets (1), 1st set (1385758746,1866209186), 2nd set (0,0) my votes (2), total votes (2)

Recovery

Now how do we get things back to normal? Let’s first make the lun visible again on the san switch. At that time nothing happens in any logfile, so CRS doesn’t seem to poll to see if the ocrmirror is back. However when we execute now an ocrcheck, we get:

Status of Oracle Cluster Registry is as follows :<br />Version                  :          2<br />Total space (kbytes)     :     295452<br />Used space (kbytes)      :       5112<br />Available space (kbytes) :     290340<br />ID                       : 1930338735<br />Device/File Name         : /dev/oracle/ocr<br />Device/File integrity check succeeded<br />Device/File Name         : /dev/oracle/ocrmirror<br /><span style="color: rgb(255, 128, 0);">Device/File needs to be synchronized with the other device</span><br /><br />Cluster registry integrity check succeeded

Again, this makes sense. While the ocrmirror was unavailable, you may have added services, instances or whatever, so the contents of the (old) ocrmirror may be different from those of the current ocr. In our case however, nothing was changed on cluster level, so theoretically the contents of ocr and ocrmirror should still be the same. Still we get the message above. Anyway, the way to synchronize this ocr is to issue as root:

ocrconfig -replace ocrmirror /dev/oracle/ocrmirror

This will copy the contents of the ocr over the ocrmirror being located at /dev/oracle/ocrmirror. In other words, it will create a new ocrmirror in location /dev/oracle/ocrmirror as a copy of the existing ocr. Be careful with the syntax; do not use “-replace ocr” when the ocrmirror is corrupt.
At that time, we see in the crs logfile on both nodes:

2008-07-18 15:51:06.254: [ OCRMAS][25]th_master: Deleted ver keys from cache (non master)

2008-07-18 15:51:06.263: [ OCRRAW][30]proprioo: for disk 0 (/dev/oracle/ocr), id match (1), my id set (1385758746,1866209186) total id sets (2), 1st set (1385758746,1866209186), 2nd set (1385758746,1866209186) my votes (1), total votes (2)

2008-07-18 15:51:06.263: [ OCRRAW][30]proprioo: for disk 1 (/dev/oracle/ocrmirror), id match (1), my id set (1385758746,1866209186) total id sets (2), 1st set (1385758746,1866209186), 2nd set (1385758746,1866209186) my votes (1), total votes (2)

2008-07-18 15:51:06.364: [ OCRMAS][25]th_master: Deleted ver keys from cache (non master)

and in the alert file:

2008-07-18 15:51:06.246
[crsd(13848)]CRS-1007:The OCR/OCR mirror location was replaced by /dev/oracle/ocrmirror.

Note again the highlighted messages above: each ocr again has 1 vote. And all is ok again:

Status of Oracle Cluster Registry is as follows :<br />Version                  :          2<br />Total space (kbytes)     :     295452<br />Used space (kbytes)      :       5112<br />Available space (kbytes) :     290340<br />ID                       : 1930338735<br />Device/File Name         : /dev/oracle/ocr<br />Device/File integrity check succeeded<br />Device/File Name         : /dev/oracle/ocrmirror<br />Device/File integrity check succeeded<br />Cluster registry integrity check succeeded
Conclusion of scenario 1

Loosing the storage box containing the ocrmirror is no problem (the same is true for loosing ocr while ocrmirror remains available). Moreover it can be recovered without having to stop the cluster (the restart of crs on node 1 above was for educational purposes only). This corresponds with what is told in the RAC FAQ on Metalink Note 220970.1: “If the corruption happens while the Oracle Clusterware stack is up and running, then the corruption will be tolerated and the Oracle Clusterware will continue to funtion without interruptions” (however I think that the logfiles above give you much more insight in what really happens).

However another important concept is the story of the votecount. The test above shows that CRS is able to start if it finds 2 ocr devices each having one vote (the normal case) or if it finds 1 ocr having 2 votes (the case after loosing the ocrmirror). Note that at the moment of the failure, the vote count of the ocr could be increased by oracle from 1 to 2, because CRS was running.

In the next chapter, we will do this over again, but with both nodes down…

Advertisements

3 Responses to The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 1

  1. Freek says:

    I have done the same test during the setup of a linux rac using nfs mounts, and here the failover from the ocr to the ocrmirror is not happening. This is in fact normal given that the mandatory mount options for an ocr volume include hard and nointr. (Oracle refuses to use a volume that has these options not explicitly set, even if they are the default settings).

    I have opened a case with oracle about this, but after pointing the finger to the os (which was OEL 🙂 ) and the storage (netapp), they are now claiming that there is no failover between the ocr and ocrmirror and that I should file it as a feature enhancement.
    And that was after I pointed them to the section in the rac faq that describes what should happen when the ocr volume is lost …

    PS) The date on the post seems to be wrong, probably it has the date of when you first started writing the article.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: