This post is to warn you of a potential problem you may encounter. A combination where it definitely occurs is 10.2 Rac and ASM on Solaris. The problem is the following: due to a lot of db activity, no cleanup of archivelog files, multiple databases present and multiple instances per database, the amount of archivelog files has grown up to +100.000 in ASM (in my case). As a result it turns out that a query on v$asm_file lasts 1 minute or even more, up to 3 minutes. Not a problem on itself, wasn’t it that the query on v$asm_file turns out to block all file manipulation operations in ASM, including the creation/deletion of archivelogs or registering archivelogs in the controlfile. This generates CF enqueue waits in the databases using asm (for lgwr and arc) and very soon waits by user processes on log file sync becasue the lgwr is blocked. In this way your production may be frozen until the query on v$asm_file ends or is interrupted. Knowing that emagent can access v$asm_file, and creation and registering or archivelogs as well, this problem may occur very often, especially in data guard environments where a lot of archivelog manipulation is done.
This behaviour is hard to believe but I can confirm that I have seen this with my own eyes and analyzed, tested and reproduced it myself. The root cause of this is the combination of the slow asm query and the blocking effect of it, two separate things. I have had no chance yet to test the same on linux or any other platform. It might be Solaris specific because there exists bug 6761100: Query on V$asm_files very slow on Solaris compared to Linux. If the query on v$asm_file would be fast, you probably would never run into the wait events mentioned above. I cannot confirm either if it only occurs in RAC, or also in non-rac installations.
So to me it looks as if ASM isn’t designed (at the moment) for very large amounts of files. However if you have a 3-node rac cluster with 4 databases on it, each instance doing a log switch every 15 minutes (because maybe there is a standby that should not lag behind too much), you produce 3x4x96 archives per day. Keeping these three weeks gives already 25.000 files. If then something is accidentally wrong in the cleanup script, or you have batch jobs that generate a lot of redo, it may happen that you get still a larger amount of files.
The most annoying thing about this is that you can’t get rid of it. It turns out that removing the files again afterwards doesn’t really solve the problem. To me it looks as if for each deleted file, something is left in ASM, that needs to be traversed during the query. Only emptying the diskgroup and recreating it with fewer files will solve the problem.
But I repeat, it is only the combination of the two issues (slow query and blocking effect) that causes trouble, and as far as I know, only in the combination 10.2 rac on solaris. For me, the query on v$asm_file may be slow as hell, as long as it doesn’t block anything else.
So you are warned, it is not a bad idea to keep the amount of files in asm relatively low (I would say below 10.000).
P.S. If you experienced similar behaviour in another hw/sw configuration, I am very interested in knowing the details of it.