The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Conclusion

Oct 15, 2009

This is a follow-up of chapter 5.
The most important thing in this story is the fact that it is perfectly possible to configure your Oracle RAC cluster with 2 storage boxes in a safe way. You just need an independent location for the 3rd voting disk, but if you have that, you can be sure that your cluster will remain running when one of those storage boxes fail. You will even be able to repair it without downtime after e.g. buying a new storage box (call Uptime for good prices…:)

So were all these tests then really needed? Yes, I do think so, because of the following reasons


The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 5

Oct 12, 2009

Scenario 5: Loss of ocrmirror from non-ocr-master – reloaded

This is a follow-up of chapter 4.
In this final scenario, we do the same thing as in scenario 4. I.e. while crs is running on both nodes, we hide the ocrmirror from the non-ocr-master node, which is node 2 now.
So node 1 is the master, we hide ocrmirror from node 2 and we verify on node 2:

(nodeb01 /app/oracle/crs/log/nodeb01) $ dd if=/dev/oracle/ocrmirror of=/dev/null bs=64k count=1
dd: /dev/oracle/ocrmirror: open: I/O error

What happens?
Read the rest of this entry »


The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 4

Oct 7, 2009

Scenario 4: Loss of ocrmirror from the non-OCR MASTER

This is a vollow-up of chapter 3.
Let’s try to do the same thing as scenario 3, however now hiding the lun from a node NOT being the OCR MASTER, while crs is running on both nodes.

What happens?
Read the rest of this entry »


The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 3

Oct 2, 2009

Scenario 3: Loss of OCRmirror from the OCR MASTER only

This is a followup of chapter 2.

As we have seen in scenario 1, the OCR MASTER will update the votecount. Now let’s hide the ocrmirror from only 1 node: the node being the OCR MASTER, while the other node continues to see the ocrmirror. Will CRS get confused about this?

Note: while doing this test, crs is running on both nodes.
Read the rest of this entry »


The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 2

Sep 19, 2009

Scenario 2: loss of ocrmirror, both nodes down

(This is the follow-up of chapter 1)

Let’s investigate the vote count a little further by doing the following test:

  • First stop crs on both nodes
  • Then make the lun with ocrmirror unavailable to both nodes

What happens?
Read the rest of this entry »


The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Chapter 1

Sep 1, 2009

Scenario 1: loss of ocrmirror, both nodes up

(This is the followup of article “Introduction“)

Facts
  • CRS is running on all nodes
  • The storage box containing the OCRmirror is made unavailable to both hosts (simulating a crash of one storage box).

What happens?
Read the rest of this entry »


The ultimate story about OCR, OCRMIRROR and 2 storage boxes – Introduction

Aug 28, 2009

Some time ago I wrote a blog about stretched clusters and the OCR. The final conclusion at that time was that there was no easy way to get your OCR safe on both storages, and hence I disrecommended clusters with 2 storage boxes. However, after some more investigation I may have to change my mind. I did extended testing on the OCR and in this blog I want to share my experiences. Read the rest of this entry »


A really recommended ASM patch – failing lun

Aug 6, 2008

The following is a real life experience about failing disks/luns and how ASM reacts to this. We used 10.2.0.4 on Solaris with a HDS storage box and MPXio. We made an ASM diskgroup of 2 mirrorred disks. Then we made the luns unavailable to the hosts (hide lun). The result was not really what we expected. Read the rest of this entry »


Oracle VM and multiple local disks

Jun 9, 2008

For my Oracle VM test environment I have a server available with multiple internal disks of different size and speed. So I was wondering if it is possible to have all these disks used together for my virtual machines in Oracle VM.

If all disks would have been the same size and speed, I could easily use the internal raid controller to put them in mirror, stripe or raid5 and end up with one large volume, alias disk, for my Oracle VM. However due to the differences in characteristics of the disks (speed/size) this is not a good idea. So I started to look in Oracle VM Manager (the java console) to see what is possible.

It turned out soon to me that Oracle VM is designed for a different architecture: in fact the desired setup is to dispose of a (large) SAN box with shared storage that is available to multiple servers. Then all these servers can be put in a server pool, sharing the same storage. This setup allows live migration of running machines to another physical server. Of course this makes sense because it fits nicely in the concept of grid computing: if any physical server fails, just restart your virtual machine on another one, and add machines according to your performance needs. But it doesn’t help me: I don’t have got one storage with multiple servers, but I have one server with multiple disks.

So I started to browse a little in all the executables of the OVM installation, and I found under /usr/lib/ovs the ovs-makerepo script. According to me the architecture is as follows (as far as I can find on the internet, because there is not much clear documentation on this): when installing OVM, you have a /boot a / and a swap partition (just as in traditional linux) and OVM requires one large partition to be used for virtual machines, which will be mounted under /OVS. In this partition you find subdirectories “running_pool” which contains all the virtual machines that you have created and that you can start, and a subdirectory “seed_pool” which contains templates you can start from for creating new machines. There is also “local”, “remote” and “publish_pool”, however they were irrelevant for me at the moment and I didn’t try to figure out what they are used for.

With this in mind I can install Oracle VM on my first disk and end up with 4 partitions on /dev/sda:

   Filesystem 1K-blocks     Used Available Use% Mounted on
   /dev/sda1     248895    25284    210761  11% /boot
   (sda2 is swap)
   /dev/sda3    4061572   743240   3108684  20% /
   /dev/sda4   24948864 22068864   2880000  89% /OVS

With this in mind I now want to add the space on my second disk (/dev/sdb) to this setup. So first I create one large partition on the disk using fdisk. Then I create an ocfs file system on it as follows:

[root@nithog ovs]# mkfs.ocfs2 /dev/sdb1
mkfs.ocfs2 1.2.7
Filesystem label=
Block size=4096 (bits=12)
Cluster size=4096 (bits=12)
Volume size=72793694208 (17771898 clusters) (17771898 blocks)
551 cluster groups (tail covers 31098 clusters, rest cover 32256 clusters)
Journal size=268435456
Initial number of node slots: 4
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 4 block(s)
Formatting Journals: done
Writing lost+found: done
mkfs.ocfs2 successful

Initially I created the file system as ext3 which worked well. However there was one strange thing. This is what you get:

  • Create a new (paravirtualized) (linux) virtual machine in this new (ext3-based) repository (see later how exactly)
  • Specify a disk of e.g. 2Gb
  • Complete the wizard
  • This prepares a machine where you can start using the linux installer on the console to install the machine (do not start to install yet)
  • Now look in …/running_pool/machine_name and see a file of 2Gb
  • Now do du -sk on …/running_pool/machine and see that only 20Kb is used
  • From the moment you start to partition your disk inside the virtual machine, the output of “du -sk” grows the same amount as the data you really put in it. So it behaves a bit like ‘dynamic provisioning’.
  • Note however that ls -l shows a file of 2Gb at any time

I don’t know for the moment if this behaviour is caused by the fact that the file system is ext3, but anyway, I leave it up to you to judge if this is an advantage or a disadvantage.

Now when trying to add my new sdb1 partition as an extra repository, I got:

Usage:

[root@nithog ~]# /usr/lib/ovs/ovs-makerepo
 usage: /usr/lib/ovs/ovs-makerepo <source> <shared> <description>
        source: block device or nfs path to filesystem
        shared: filesystem shared between hosts?  1 or 0
        description: descriptive text to be displayed in manager

Execution:

   [root@nithog ovs]# /usr/lib/ovs/ovs-makerepo /dev/sdb1 0 "Repo on disk 2" 
   ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat mount.ocfs2: 
   Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted" 
   Error mounting /dev/sdb1

Seems like the script expects something like a cluster, but I just have a standalone node… I think that this script is intended to add a shared repository to a cluster of nodes. No problem, let’s try to convert our standalone machine to a one-node cluster: create the file /etc/ocfs2/cluster.conf:

cluster:
        node_count = 1
        name = ocfs2
node:
        ip_port = 7777
        ip_address = 10.7.64.160
        number = 1
        name = nithog
        cluster = ocfs2

Note that the indented lines MUST start with a <TAB> and then the parameter with its value. After creating this file I could do:

   [root@nithog ovs]# /etc/init.d/o2cb online ocfs2
   Starting O2CB cluster ocfs2: OK

and then
[root@nithog ovs]# /usr/lib/ovs/ovs-makerepo /dev/sdb1 0 "Repo on disk 2" Initializing NEW repository /dev/sdb1 SUCCESS: Mounted /OVS/877DECC5B658433D9E0836AFC8843F1B Updating local repository list. ovs-makerepo complete

As you can see, an extra subdirectory is created in the /OVS file system, with a strange UUID as its name. Under this directory my new file system /dev/sdb1 is mounted. This file system is a real new repository, because under /OVS/877DECC5B658433D9E0836AFC8843F1B you find as well the running_pool and seed_pool directories. It is also listed in /etc/ovs/repositories (but it is NOT recommended to edit this file manually).

Then I looked in the Oracle VM Manager (the java based web gui) but I didn’t find anything of this new repository. It looks as if this gui is not (yet) designed to handle multiple repositories. However I started to figure out if my new disk could really be used for virtual machines, and my results are:

  • When creating a new virtual machine, you have no chance of specifying in which repository it has to come
  • It seems to come in the repository where there is the most amount of free space (but I should do more testing to get 100% certainty)
  • When adding a new disk to an existing virtual machine (an extra file on oracle-vm level) the file will come in the same repository, even the same directory as where the initial files of your virtual machine are located. If there is NOT enough free space on the disk, Oracle VM will NOT put your file in another repository on another disk.
  • You can move the datafiles of your virtual machine to any other location while the machine is not running, and while changing the reference to the file in /etc/xen/<machine_name>
  • So actually it looks that on xen-level you can put your vm datafiles in any directory; the concept of the repositories seems to be oracle-vm specific.
  • So if you create a new virtual machine and Oracle puts it in the wrong repository, it is not difficult at all to move it afterwards to another filesystem/repostory. It just requires a little manual intervention. However it seems recommended to keep your machines always in an oracle-vm repository, in the running_pool, because only in that way it can be managed by the Oracle-vm gui.

I am sure that there are many things that have an abvious explanation, but I have to admit that I didn’t read the manuals of ocfs and oracle vm completely from the start to the end. Also I think that Oracle

Conclusion: Oracle VM seems to be capable of having multiple repositories on different disks, but the GUI is not ready to handle them. But with a minimum of manual intervention, it is easy to do all desired tasks in command-line mode.


ASM and large amount of files: follow-up

Apr 20, 2008

As promised I will keep you up to date about the problem in my previous post about the issues with ASM on Solaris when a large amount of files exists (or have existed). I can confirm that there is a fix now that completely solves the problem.

Read the rest of this entry »