The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 2

Scenario 2: loss of ocrmirror, both nodes down

(This is the follow-up of chapter 1)

Let’s investigate the vote count a little further by doing the following test:

  • First stop crs on both nodes
  • Then make the lun with ocrmirror unavailable to both nodes

What happens?

Let’s check the ocr status before starting crs on any node:

bash-3.00# ocrcheck
PROT-602: Failed to retrieve data from the cluster registry

The crs alert file shows:

2008-07-18 15:57:36.438
[client(24204)]CRS-1011:OCR cannot determine that the OCR content contains the latest updates. Details in /app/oracle/crs/log/nodea01/client/ocrcheck_24204.log.

and the mentioned ocrcheck_24204.log file:

Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2008 Oracle.
All rights reserved.
2008-07-18 15:57:36.405: [OCRCHECK][1]ocrcheck starts…
2008-07-18 15:57:36.437: [ OCRRAW][1]proprioini: disk 0 (/dev/oracle/ocr) doesn’t
have enough votes (1,2)

2008-07-18 15:57:36.438: [ OCRRAW][1]proprinit: Could not open raw device
2008-07-18 15:57:36.438: [ default][1]a_init:7!: Backend init unsuccessful : [26]
2008-07-18 15:57:36.439: [OCRCHECK][1]Failed to access OCR repository: [PROC-26: Error while accessing the physical storage]
2008-07-18 15:57:36.439: [OCRCHECK][1]Failed to initialize ocrchek2
2008-07-18 15:57:36.439: [OCRCHECK][1]Exiting [status=failed]…

I didn’t try to start the CRS at this time, however I am sure it would result in the same error messages. Note the colored messages. The second one explains what the real problem is: one of the ocr devices is unavailable: error while accessing the physical storage. This is exactly the information you need to troubleshoot a failing crs start. The other message tells us more about the internals: the remaining ocr has only 1 vote, which isn’t enough. So that’s rule 3 in the world of CRS. So read and remember for once and for all:

  1. Rule 1: CRS can start if it finds 2 ocr devices each having one vote (the normal case)
  2. Rule 2: CRS can start if it finds 1 ocr having 2 votes (the case after loosing the ocrmirror).
  3. Rule 3: CRS CANNOT start if it finds only one ocr device having only 1 vote

Now if this is a production environment and we really want to get the cluster + databases up, how do we proceed? Well we can do so by manually telling the cluster that the remaining ocr is valid and up-to-date. Note however that this is an important decision. It is up to you to know that the remaining ocr is valid. If you have been playing too much with missing luns, adding services, missing the other lun etc… it may be that the contents of the ‘invisible’ ocrmirror are maybe more recent than those of the visible ocr. If in that case you tell crs that the ocr is valid, you may loose important information from your ocrmirror. Anyway in most cases you will know very well what to do, and issue as root:

ocrconfig -overwrite

Now find the most recent file in $ORA_CRS_HOME/log/nodename/client and see that it contains:

Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2008 Oracle.
All rights reserved.
2008-07-18 15:59:56.828: [ OCRCONF][1]ocrconfig starts…
2008-07-18 15:59:58.644: [ OCRRAW][1]propriowv_bootbuf: Vote information on disk
0 [/dev/oracle/ocr] is adjusted from [1/2] to [2/2]

2008-07-18 15:59:58.644: [ OCRCONF][1]Successfully overwrote OCR configuration on
disk
2008-07-18 15:59:58.644: [ OCRCONF][1]Exiting [status=success]…

So now we are in the situation of scenario 1: one ocr device available having 2 votes. This gives:

Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     295452
         Used space (kbytes)      :       5112
         Available space (kbytes) :     290340
         ID                       : 1930338735
         Device/File Name         : /dev/oracle/ocr
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/oracle/ocrmirror
                                    Device/File unavailable

         Cluster registry integrity check succeeded

And the crs startup happens without problem:

-bash-3.00# crsctl start crs<br />Attempting to start CRS stack<br />The CRS stack will be started shortly

Note however that you still have to recover from this as in scenario 1 using “ocrconfig -replace ocrmirror /dev/…” once the storage box containing the ocrmirror is available again.

Conclusion of scenario 2

When loosing an ocr or ocrmirror while crs is down on both nodes, Oracle is not able to update the vote count of the remaining ocr (no crs processes are running to do this). As a consequence it is up to you to do that by using the “overwrite” option of ocrconfig. After this, CRS can start as normal and later on you can recover from this when the ocrmirror becomes available again or when you can use another new device for ocrmirror.

So this looks great, let’s buy that additional storage box now.

But still I am not satisfied yet. Until now we had ‘clean errors’. I.e. both nodes were up or down, and the storage disappeared from both nodes at the same time. Let’s play a little more in the next chapters…

About these ads

4 Responses to The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 2

  1. [...] Geert De Paep-The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 1 [...]

  2. rajkumar says:

    Excellnt information about OCR, however i could not able to find Chapter 1,could you please check it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: