This year I became part of the parents council of my childrens school. And like every small organisation in Belgium (football clubs, gymnastics, schools, …) it is the time of year to do the annual mussel-event for fund raising (mmm, delicious). My school happened to use an Access-IIS-ASP application to do the billing when people leave the event (do the calculation for their food and drink consumptions). But o fortuna, the evening before the event, the application turned out to systematically crash when entering data, and murphy oh murphy, the author of it had just become father and could not be reached. Panic of course. Read the rest of this entry »
The following is a real life experience about failing disks/luns and how ASM reacts to this. We used 10.2.0.4 on Solaris with a HDS storage box and MPXio. We made an ASM diskgroup of 2 mirrorred disks. Then we made the luns unavailable to the hosts (hide lun). The result was not really what we expected. Read the rest of this entry »
For my Oracle VM test environment I have a server available with multiple internal disks of different size and speed. So I was wondering if it is possible to have all these disks used together for my virtual machines in Oracle VM.
If all disks would have been the same size and speed, I could easily use the internal raid controller to put them in mirror, stripe or raid5 and end up with one large volume, alias disk, for my Oracle VM. However due to the differences in characteristics of the disks (speed/size) this is not a good idea. So I started to look in Oracle VM Manager (the java console) to see what is possible.
It turned out soon to me that Oracle VM is designed for a different architecture: in fact the desired setup is to dispose of a (large) SAN box with shared storage that is available to multiple servers. Then all these servers can be put in a server pool, sharing the same storage. This setup allows live migration of running machines to another physical server. Of course this makes sense because it fits nicely in the concept of grid computing: if any physical server fails, just restart your virtual machine on another one, and add machines according to your performance needs. But it doesn’t help me: I don’t have got one storage with multiple servers, but I have one server with multiple disks.
So I started to browse a little in all the executables of the OVM installation, and I found under /usr/lib/ovs the ovs-makerepo script. According to me the architecture is as follows (as far as I can find on the internet, because there is not much clear documentation on this): when installing OVM, you have a /boot a / and a swap partition (just as in traditional linux) and OVM requires one large partition to be used for virtual machines, which will be mounted under /OVS. In this partition you find subdirectories “running_pool” which contains all the virtual machines that you have created and that you can start, and a subdirectory “seed_pool” which contains templates you can start from for creating new machines. There is also “local”, “remote” and “publish_pool”, however they were irrelevant for me at the moment and I didn’t try to figure out what they are used for.
With this in mind I can install Oracle VM on my first disk and end up with 4 partitions on /dev/sda:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 248895 25284 210761 11% /boot (sda2 is swap) /dev/sda3 4061572 743240 3108684 20% / /dev/sda4 24948864 22068864 2880000 89% /OVS
With this in mind I now want to add the space on my second disk (/dev/sdb) to this setup. So first I create one large partition on the disk using fdisk. Then I create an ocfs file system on it as follows:
[root@nithog ovs]# mkfs.ocfs2 /dev/sdb1 mkfs.ocfs2 1.2.7 Filesystem label= Block size=4096 (bits=12) Cluster size=4096 (bits=12) Volume size=72793694208 (17771898 clusters) (17771898 blocks) 551 cluster groups (tail covers 31098 clusters, rest cover 32256 clusters) Journal size=268435456 Initial number of node slots: 4 Creating bitmaps: done Initializing superblock: done Writing system files: done Writing superblock: done Writing backup superblock: 4 block(s) Formatting Journals: done Writing lost+found: done mkfs.ocfs2 successful
Initially I created the file system as ext3 which worked well. However there was one strange thing. This is what you get:
- Create a new (paravirtualized) (linux) virtual machine in this new (ext3-based) repository (see later how exactly)
- Specify a disk of e.g. 2Gb
- Complete the wizard
- This prepares a machine where you can start using the linux installer on the console to install the machine (do not start to install yet)
- Now look in …/running_pool/machine_name and see a file of 2Gb
- Now do du -sk on …/running_pool/machine and see that only 20Kb is used
- From the moment you start to partition your disk inside the virtual machine, the output of “du -sk” grows the same amount as the data you really put in it. So it behaves a bit like ‘dynamic provisioning’.
- Note however that ls -l shows a file of 2Gb at any time
I don’t know for the moment if this behaviour is caused by the fact that the file system is ext3, but anyway, I leave it up to you to judge if this is an advantage or a disadvantage.
Now when trying to add my new sdb1 partition as an extra repository, I got:
[root@nithog ~]# /usr/lib/ovs/ovs-makerepo usage: /usr/lib/ovs/ovs-makerepo <source> <shared> <description> source: block device or nfs path to filesystem shared: filesystem shared between hosts? 1 or 0 description: descriptive text to be displayed in manager
[root@nithog ovs]# /usr/lib/ovs/ovs-makerepo /dev/sdb1 0 "Repo on disk 2" ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted" Error mounting /dev/sdb1
Seems like the script expects something like a cluster, but I just have a standalone node… I think that this script is intended to add a shared repository to a cluster of nodes. No problem, let’s try to convert our standalone machine to a one-node cluster: create the file /etc/ocfs2/cluster.conf:
cluster: node_count = 1 name = ocfs2 node: ip_port = 7777 ip_address = 10.7.64.160 number = 1 name = nithog cluster = ocfs2
Note that the indented lines MUST start with a <TAB> and then the parameter with its value. After creating this file I could do:
[root@nithog ovs]# /etc/init.d/o2cb online ocfs2 Starting O2CB cluster ocfs2: OK
[root@nithog ovs]# /usr/lib/ovs/ovs-makerepo /dev/sdb1 0 "Repo on disk 2" Initializing NEW repository /dev/sdb1 SUCCESS: Mounted /OVS/877DECC5B658433D9E0836AFC8843F1B Updating local repository list. ovs-makerepo complete
As you can see, an extra subdirectory is created in the /OVS file system, with a strange UUID as its name. Under this directory my new file system /dev/sdb1 is mounted. This file system is a real new repository, because under /OVS/877DECC5B658433D9E0836AFC8843F1B you find as well the running_pool and seed_pool directories. It is also listed in /etc/ovs/repositories (but it is NOT recommended to edit this file manually).
Then I looked in the Oracle VM Manager (the java based web gui) but I didn’t find anything of this new repository. It looks as if this gui is not (yet) designed to handle multiple repositories. However I started to figure out if my new disk could really be used for virtual machines, and my results are:
- When creating a new virtual machine, you have no chance of specifying in which repository it has to come
- It seems to come in the repository where there is the most amount of free space (but I should do more testing to get 100% certainty)
- When adding a new disk to an existing virtual machine (an extra file on oracle-vm level) the file will come in the same repository, even the same directory as where the initial files of your virtual machine are located. If there is NOT enough free space on the disk, Oracle VM will NOT put your file in another repository on another disk.
- You can move the datafiles of your virtual machine to any other location while the machine is not running, and while changing the reference to the file in /etc/xen/<machine_name>
- So actually it looks that on xen-level you can put your vm datafiles in any directory; the concept of the repositories seems to be oracle-vm specific.
- So if you create a new virtual machine and Oracle puts it in the wrong repository, it is not difficult at all to move it afterwards to another filesystem/repostory. It just requires a little manual intervention. However it seems recommended to keep your machines always in an oracle-vm repository, in the running_pool, because only in that way it can be managed by the Oracle-vm gui.
I am sure that there are many things that have an abvious explanation, but I have to admit that I didn’t read the manuals of ocfs and oracle vm completely from the start to the end. Also I think that Oracle
Conclusion: Oracle VM seems to be capable of having multiple repositories on different disks, but the GUI is not ready to handle them. But with a minimum of manual intervention, it is easy to do all desired tasks in command-line mode.
This post is to warn you of a potential problem you may encounter. A combination where it definitely occurs is 10.2 Rac and ASM on Solaris. The problem is the following: due to a lot of db activity, no cleanup of archivelog files, multiple databases present and multiple instances per database, the amount of archivelog files has grown up to +100.000 in ASM (in my case). As a result it turns out that a query on v$asm_file lasts 1 minute or even more, up to 3 minutes. Not a problem on itself, wasn’t it that the query on v$asm_file turns out to block all file manipulation operations in ASM, including the creation/deletion of archivelogs or registering archivelogs in the controlfile. This generates CF enqueue waits in the databases using asm (for lgwr and arc) and very soon waits by user processes on log file sync becasue the lgwr is blocked. In this way your production may be frozen until the query on v$asm_file ends or is interrupted. Knowing that emagent can access v$asm_file, and creation and registering or archivelogs as well, this problem may occur very often, especially in data guard environments where a lot of archivelog manipulation is done.
Yesterday I had the uniqe opportunity to attend a Q&A session with Wim Coekaerts, the father of the whole linux story within Oracle. I agree completely with what Freek already posted in his blog (he attended as well). On top of that I want to add some personal thoughts as well. Read the rest of this entry »
If you are about to migrate to Oracle RAC, there is an important thing to know about the OCRmirror. As far as I know, this is documented nowhere in any Oracle doc. This post talks about the failover properties of the OCRmirror. Read the rest of this entry »
Recently I described the use of Oracle Streams to do an Oracle upgrade. It allows to create a new empty database of a higher version, and then use streams replication to keep it in sync with the old one. The final migration then consists of stopping the activity on the old database, make sure it is all replicated on the new, and then connect all clients to the new.
This is running fine now in test. However, sometimes you want that extra confirmation that the new database is really identical to the old. Personally, I know that Streams will do its work correctly, but I do have to prove this to others as well.
For small tables, you could easily write in the new database a “select from some_table minus select from some_table@old” and use this construction to verify that rows are identical in old and new database. However this will become a complex operation on large table, and the use of LOBs is even not possible over a database link.
So I came up with a plsql package to calculate a checksum of a table. After some initial thoughts, it turned out not to be so difficult. dbms_utility is my friend. It contains a function “get_hash_value” that returns a number calculated out of a string. The same string will always return the same number, and by choosing an output range between 0 and power(2,31), you will have a hard time finding two strings that return the same hash value.
The problem then reduces to finding a way to convert every column in a string. Easy for char and varchar2, ok for number, date and timestamp using a to_char function, but what about my CLOBs and BLOBs. Well, now dbms_crypto is my friend (10g). This package contains a “hash” function that accepts clob and blob as its input and returns a 128-bit hash (using MD4). The only problem is that it returns a RAW. So you need to be familiar with raw and hex stuff in order to know that the rawtohex function can convert this output to a string containing only hex characters (0..9,A..F), and the string is what we needed, right?
Now it is just a question of generating some dynamic sql, concatenating all output of all columns to eachother and calling the dbms_utility.get_hash_value on it. This returns a number for every row in the table. You can then just select sum(…) of this, to obtain a global hash value of the table.
My experience is that these functions are all very fast, and that the time to calculate the table hash value is nearly equal to the full tabe scan, which is the minimum you have to do in order to check *all* data in the table.
Applying this to streams, I look at the apply process to find the last apply scn, and then use my hash function to calculate the hash number on the tables “as of scn <last apply scn>”. If this returns the same result in my old and my new database, I am 100% sure that both tables are identical (except for a chance of 1 out of power(2,31), but it is still easier than that to win the lottery 2 times in a row…).
I do have another package to compare rows between two tables, with extended options, customization, logging and exact identification of the differences, (however it is slower than the table hash above) but that’s for later…