A really recommended ASM patch – failing lun

The following is a real life experience about failing disks/luns and how ASM reacts to this. We used 10.2.0.4 on Solaris with a HDS storage box and MPXio. We made an ASM diskgroup of 2 mirrorred disks. Then we made the luns unavailable to the hosts (hide lun). The result was not really what we expected.After hiding the lun, the ASM alert file shows the following ‘normal’ messages:

15:26:57+: Errors in file /app/oracle/admin/+ASM/bdump/+asm1_gmon_10611.trc:
15:26:57+: ORA-27091: unable to queue I/O
15:26:57+: ORA-27072: File I/O error
15:26:57+: SVR4 Error: 5: I/O error
15:26:57+: Additional information: 4
15:26:57+: Additional information: 2048
15:26:57+: Additional information: -1

and some time later

15:32:23+: WARNING: offlining disk 0.4042332304 (ARESDATAA) with mask 0x3
15:32:23+: WARNING: offlining disk 0.4042332304 (ARESDATAA) with mask 0x3

and

15:32:33+: WARNING: kfk failed to open a disk[/dev/oracle/asm/aresdata-a]
15:32:33+: Errors in file /app/oracle/admin/+ASM/udump/+asm1_ora_18313.trc:
15:32:33+: ORA-15025: could not open disk ‘/dev/oracle/asm/aresdata-a’
15:32:33+: ORA-27041: unable to open file
15:32:33+: SVR4 Error: 5: I/O error
15:32:33+: Additional information: 3
15:32:33+: NOTE: PST update: grp = 1, dsk = 0, mode = 0x6
15:32:35+: NOTE: PST update: grp = 1, dsk = 0, mode = 0x4
15:32:35+: NOTE: group ARESDATA: relocated PST to: disk 0001 (PST copy 0)
15:32:35+: NOTE: PST update: grp = 1, dsk = 0, mode = 0x4
15:32:35+: NOTE: cache closing disk 0 of grp 1: ARESDATAA
15:32:35+: NOTE: cache closing disk 0 of grp 1: ARESDATAA
15:33:50+: WARNING: PST-initiated drop disk 1(780265568).0(4042332304) (ARESDATAA)
15:33:50+: NOTE: PST update: grp = 1
15:33:50+: NOTE: group ARESDATA: relocated PST to: disk 0001 (PST copy 0)
15:33:50+: NOTE: requesting all-instance membership refresh for group=1
15:33:50+: NOTE: membership refresh pending for group 1/0x2e81e860 (ARESDATA)
15:33:50+: SUCCESS: refreshed membership for 1/0x2e81e860 (ARESDATA)
15:33:53+: SUCCESS: PST-initiated disk drop completed
15:33:53+: SUCCESS: PST-initiated disk drop completed
15:33:56+: NOTE: starting rebalance of group 1/0x2e81e860 (ARESDATA) at power 1
15:33:56+: Starting background process ARB0
15:33:56+: ARB0 started with pid=21, OS id=19285
15:33:56+: NOTE: assigning ARB0 to group 1/0x2e81e860 (ARESDATA)
15:33:56+: NOTE: F1X0 copy 1 relocating from 0:2 to 1:2
15:33:56+: NOTE: F1X0 copy 2 relocating from 1:2 to 0:2
15:33:56+: NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 65534:4294967294
15:33:56+: NOTE: X->S down convert bast on F1B3 bastCount=2

15:34:14+: NOTE: group ARESDATA: relocated PST to: disk 0001 (PST copy 0)
15:34:14+: WARNING: offline disk number 0 has references (1394 AUs)
15:34:14+: NOTE: PST update: grp = 1
15:34:14+: NOTE: group ARESDATA: relocated PST to: disk 0001 (PST copy 0)

However, every time we do a query on the v$asm_disk view, we experience hangs of 30..90 seconds. Same when adding a tablespace in a database. It looks as if ASM, whenever trying to access the failed disk, waits for OS timeouts. I assume that every operation that needs access to ASM disks (e.g. data file autoextend that needs to allocate space in the asm disk, creation of archivelog, …) suffers from these timeout. Not really acceptable for a production environment. I do want to mention that dd, and even orion for asynch IO, detect the error immediately without waiting for any timeout.

You can clearly see the ASM waits when you truss the server process of the asm sqlplus session when you do a select on v$asm_disk. For each failed disk you get:

9377/1: 17.3106 open(“/dev/oracle/asm/aresdata-a”, O_RDWR|O_DSYNC) Err#5 EIO
9377/1: 18.3195 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 18.3198 open(“/dev/oracle/asm/aresdata-a”, O_RDWR|O_DSYNC) Err#5 EIO
9377/1: 19.3295 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 19.3298 open(“/dev/oracle/asm/aresdata-a”, O_RDWR|O_DSYNC) Err#5 EIO
9377/1: 20.3395 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 20.3398 open(“/dev/oracle/asm/aresdata-a”, O_RDWR|O_DSYNC) Err#5 EIO
9377/1: 21.3495 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 21.3497 open(“/dev/oracle/asm/aresdata-a”, O_RDWR|O_DSYNC) Err#5 EIO
9377/1: 22.3595 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 22.3598 open(“/dev/oracle/asm/aresdata-a”, O_RDWR|O_DSYNC) Err#5 EIO
9377/1: 22.3605 open(“/dev/oracle/asm/aresdata-a”, O_RDONLY|O_DSYNC) Err#5 EIO
9377/1: 23.3695 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 23.3697 open(“/dev/oracle/asm/aresdata-a”, O_RDONLY|O_DSYNC) Err#5 EIO
9377/1: 24.3795 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 24.3798 open(“/dev/oracle/asm/aresdata-a”, O_RDONLY|O_DSYNC) Err#5 EIO
9377/1: 25.3895 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 25.3897 open(“/dev/oracle/asm/aresdata-a”, O_RDONLY|O_DSYNC) Err#5 EIO
9377/1: 26.3995 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 26.3998 open(“/dev/oracle/asm/aresdata-a”, O_RDONLY|O_DSYNC) Err#5 EIO
9377/1: 27.4095 nanosleep(0xFFFFFFFF7FFF8FE0, 0xFFFFFFFF7FFF8FD0) = 0
9377/1: 27.4097 open(“/dev/oracle/asm/aresdata-a”, O_RDONLY|O_DSYNC) Err#5 EIO
9377/1: 27.4105 write(5, ” * * * 2 0 0 8 – 0 7 -“.., 27) = 27
9377/1: 27.4106 write(5, “\n”, 1) = 1
9377/1: 27.4109 write(5, ” W A R N I N G : k f k”.., 62) = 62
9377/1: 27.4109 write(5, “\n”, 1) = 1
9377/1: 27.4111 close(6) = 0
9377/1: 27.4111 open(“/app/oracle/admin/+ASM/bdump/alert_+ASM2.log”, O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE, 0660) = 6
9377/1: 27.4118 time() = 1217417698
9377/1: 27.4119 writev(6, 0xFFFFFFFF7FFF8080, 3) = 88

So it tries to access each disk 6 times in read write mode and 6 times again in read only mode. A loss of 12 valuable seconds….

At the same time, the os messages file generates the following messages every second:

Jul 30 11:01:05 node1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060e800562f400000062f4000000d5 (ssd63):
Jul 30 11:01:05 node1 offline or reservation conflict

Jul 30 11:01:06 node1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060e800562f400000062f4000000d5 (ssd
63):
Jul 30 11:01:06 node1 offline or reservation conflict

We would expect that ASM is intelligent enough to detect that the disk failed, but obviously it keeps trying to access it including the waits and timeouts.
FYI, the state of the disks after the failure has become:

DISKGROUP  PATH                            Total   Used  %Usd ST  Header    FAILGROUP   STATE      DiskName
---------- ------------------------------ ------ ------ ----- --- --------- ----------- ---------- ---------------
ARESDATA   /dev/oracle/asm/aresdata-b      46068   1396     3 ONL MEMBER    B           MOUNTED    ARESDATAB

ARESDATA                                   46068   1396     3 OFF UNKNOWN   A           MOUNTED    ARESDATAA

I opened an SR on Metalink and I uploaded all possible traces I could generate (*). And guess what, due to some reason (maybe (*)), I immediately came to an excellent engineer who identified the problem immediately as a known bug, and asked development to provide a patch for 10.2.0.4 (which did not exist yet at that time). It took only 5 days for the patch to be available, and that patch solves the problem completely. After applying it, every select on v$asm_disk returns immediately.

This is it:

Patch 6278034
Description WHEN SWITCHING OFF ONE ARRAY CONTAINING ONE FAILGROUP, THE PERFORMANCE TURNS BAD
Product RDBMS Server
Select a Release 10.2.0.310.2.0.4
Platform: Sun Solaris SPARC (64-bit)
Last Updated 04-AUG-2008
Size 97K (99336 bytes)
Classification General

The patch exist as well for 10.2.0.3. I would recommend to install it on your oracle_home where asm runs. However I have no idea if the problem is applicable as well to non-Solaris environments.

Note: To resync the disks after the lun is available again, use the ALTER DISKGROUP ADD FAILGROUP x DISK ‘/dev/path/somedevice’ NAME some_new_name [FORCE] command. Not so straightforward, it turns out that trying to offline or drop the disk will not work. I.e.:

==================== OVERVIEW OF ASM DISKS ======================================

DISKGROUP PATH Total Used %Usd ST Header FAILGROUP STATE DiskName MOUNT_S
———- —————————— —— —— —– — ——— ———– ———- ————— ——-
ARESARCHA /dev/oracle/asm/aresarch-a 30708 11226 36 ONL MEMBER AREAARCHA NORMAL AREAARCHA CACHED
ARESARCHB /dev/oracle/asm/aresarch-b 30708 11202 36 ONL MEMBER ARESARCHB NORMAL ARESARCHB CACHED
ARESDATA /dev/oracle/asm/aresdata-a 46068 1412 3 ONL MEMBER A NORMAL ARESDATAA CACHED

ARESDATA 0 0 OFF UNKNOWN B HUNG ARESDATAB MISSING

Trying to add it with the same name as before:
SQL> alter diskgroup ARESDATA add failgroup B disk ‘/dev/oracle/asm/aresdata-b’ name ARESDATAB force;
alter diskgroup ARESDATA add failgroup B disk ‘/dev/oracle/asm/aresdata-b’ name ARESDATAB force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15010: name is already used by an existing ASM disk

Adding it using a new name:

SQL> alter diskgroup ARESDATA add failgroup B disk ‘/dev/oracle/asm/aresdata-b’ name ARESDATAB2 force;
alter diskgroup ARESDATA add failgroup B disk ‘/dev/oracle/asm/aresdata-b’ name ARESDATAB2 force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15034: disk ‘/dev/oracle/asm/aresdata-b’ does not require the FORCE option

SQL> alter diskgroup ARESDATA add failgroup B disk ‘/dev/oracle/asm/aresdata-b’ name ARESDATAB2;

Diskgroup altered.

I assume that the need to use the ‘force’ option depends on the kind of error you got.

Sometimes I see that the rebalance does not start automatically. Then you get the following status:
DISKGROUP PATH Total Used %Usd ST Header FAILGROUP STATE DiskName MOUNT_S
———- —————————— —— —— —– — ——— ———– ———- ————— ——-
ARESARCHA /dev/oracle/asm/aresarch-a 30708 11322 36 ONL MEMBER AREAARCHA NORMAL AREAARCHA CACHED
ARESARCHB /dev/oracle/asm/aresarch-b 30708 11298 36 ONL MEMBER ARESARCHB NORMAL ARESARCHB CACHED
ARESDATA /dev/oracle/asm/aresdata-a 46068 1412 3 ONL MEMBER A NORMAL ARESDATAA CACHED
/dev/oracle/asm/aresdata-b 46068 2 0 ONL MEMBER B NORMAL ARESDATAB2 CACHED

ARESDATA 0 0 OFF UNKNOWN B FORCING ARESDATAB MISSING

In that case, start it manually using:

SQL> alter diskgroup ARESDATA rebalance;

Diskgroup altered.

Advertisements

One Response to A really recommended ASM patch – failing lun

  1. […] public links >> asm A really recommended ASM patch – failing lun Saved by m011u5k on Fri 14-11-2008 Majestic and the mayor’s office want ASM to sponsor the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: