Warning: ASM and large amount of files (on Solaris)

Mar 5, 2008

This post is to warn you of a potential problem you may encounter. A combination where it definitely occurs is 10.2 Rac and ASM on Solaris. The problem is the following: due to a lot of db activity, no cleanup of archivelog files, multiple databases present and multiple instances per database, the amount of archivelog files has grown up to +100.000 in ASM (in my case). As a result it turns out that a query on v$asm_file lasts 1 minute or even more, up to 3 minutes. Not a problem on itself, wasn’t it that the query on v$asm_file turns out to block all file manipulation operations in ASM, including the creation/deletion of archivelogs or registering archivelogs in the controlfile. This generates CF enqueue waits in the databases using asm (for lgwr and arc) and very soon waits by user processes on log file sync becasue the lgwr is blocked. In this way your production may be frozen until the query on v$asm_file ends or is interrupted. Knowing that emagent can access v$asm_file, and creation and registering or archivelogs as well, this problem may occur very often, especially in data guard environments where a lot of archivelog manipulation is done.

