Posts Tagged crsd

TOD – Top 5 reasons the cluster will not start on a node.

I have mentioned before the top two, but here are the top 5 most common reasons that the CRS, CSS, EVM, etc, processes will not start on a node:

1. Network issue – NIC card might be unavailable, VIP assigned to a node on a different system, switch may be bad, etc.
2. Storage issue – verify that all the voting and OCR disks/luns are accessible on all nodes and are owned appropriately (OCR – root:dba & 640, Voting Disks – oracle:dba * 640). You can test by dumping with dd and piping to strings – dd if=/dev/raw/raw2 | strings .
3. Too many sockets – look in /tmp/.oracle and /var/tmp/.oracle and remove any and all socket files. These can build up between reboots after many failed attempts to start the cluster.
4. Filesystem full – if the filesystem containing CRS_HOME is full, or becomes full during the start of a cluster, it will just hang for the 600 second timeout and die. If it can’t write a log, it won’t start up.
5. /etc/oracle missing or corrupt. The /etc/oracle directory is created when root.sh is run for CRS. it contains the OCR location and other files needed by the cluster. If CRS cannot find the location of the OCR files, it will look in default locations and then simply hang while searching for the default 600 seconds.

As always, search out the logs in $CRS_HOME/log/. Expecially the alert.log and the files in crsd, cssd, and client will, while sometimes difficult to decipher, help you find the reason for the cluster problems.

Keep on RAC’n!

, , ,

No Comments