Archive for July, 2009
TOD – Top 5 reasons the cluster will not start on a node.
Posted by Jay Caviness in general on July 28th, 2009
I have mentioned before the top two, but here are the top 5 most common reasons that the CRS, CSS, EVM, etc, processes will not start on a node:
1. Network issue – NIC card might be unavailable, VIP assigned to a node on a different system, switch may be bad, etc.
2. Storage issue – verify that all the voting and OCR disks/luns are accessible on all nodes and are owned appropriately (OCR – root:dba & 640, Voting Disks – oracle:dba * 640). You can test by dumping with dd and piping to strings – dd if=/dev/raw/raw2 | strings .
3. Too many sockets – look in /tmp/.oracle and /var/tmp/.oracle and remove any and all socket files. These can build up between reboots after many failed attempts to start the cluster.
4. Filesystem full – if the filesystem containing CRS_HOME is full, or becomes full during the start of a cluster, it will just hang for the 600 second timeout and die. If it can’t write a log, it won’t start up.
5. /etc/oracle missing or corrupt. The /etc/oracle directory is created when root.sh is run for CRS. it contains the OCR location and other files needed by the cluster. If CRS cannot find the location of the OCR files, it will look in default locations and then simply hang while searching for the default 600 seconds.
As always, search out the logs in $CRS_HOME/log/
Keep on RAC’n!
New format for Grumpy-DBA.com
Posted by Jay Caviness in general on July 14th, 2009
I am experimenting with a new format on my blog site. I had received some comments that with the three column format, long text boxes were being cut off. I started using this theme with only one side column to allow for a wider post format.
TOD – ASM diskgroup space from any database
Posted by Jay Caviness in ASM, Tip 'o the day on July 14th, 2009
Have you ever noticed that v$asm views are available in all databases? This is true whether using ASM or not because the ASM home and software is the same as the database software. On this note, while the DB and ASM can share a home, they should not in order to preserve the ability to upgrade ASM separately from the database home. Simply, ASM should always be at or above the highest database software version, but that is another post.
Ok, now that I am back from that tangent. I have had users call me in a panic when they run ASM scripts in a database instead of the ASM instance. When querying v$asm_disk and checking on the physical disk/luns, it will always report 100% used:
SELECT
NVL(a.name, '[CANDIDATE]') disk_group_name
, b.path disk_file_path
, b.total_mb total_mb
, (b.total_mb - b.free_mb) used_mb
, ROUND((1- (b.free_mb / b.total_mb))*100, 2) pct_used
FROM
v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
ORDER BY
a.name;
Disk Group Name Path File Size (MB) Used Size (MB) Pct. Used
--------------- ----------------- -------------- -------------- ---------
DGROUP1 ORCL:VOL001 33,791 33,791 100.00
ORCL:VOL002 33,791 33,791 100.00
ORCL:VOL003 33,791 33,791 100.00
ORCL:VOL008 33,791 33,791 100.00
ORCL:VOL005 33,791 33,791 100.00
ORCL:VOL006 33,791 33,791 100.00
ORCL:VOL007 33,791 33,791 100.00
ORCL:VOL004 33,791 33,791 100.00
*************** -------------- --------------
270,328 270,328
-------------- --------------
Grand Total: 270,328 270,328Which can induce great panic as it appears that ASM is full, but when properly run from ASM:
Disk Group Name Path File Size (MB) Used Size (MB) Pct. Used
--------------- ----------------- -------------- -------------- ---------
DGROUP1 ORCL:VOL001 33,791 24,925 73.76
ORCL:VOL002 33,791 24,911 73.72
ORCL:VOL003 33,791 24,910 73.72
ORCL:VOL008 33,791 24,921 73.75
ORCL:VOL005 33,791 24,921 73.75
ORCL:VOL006 33,791 24,905 73.70
ORCL:VOL007 33,791 24,932 73.78
ORCL:VOL004 33,791 24,919 73.74
*************** -------------- --------------
270,328 199,344
-------------- --------------
Grand Total: 270,328 199,344Some users, including my own developers, need to know how much space is available on the diskgroup from within their own database. It was thought that we had to query asm via SQL*net or have ASM populate a file to be brought into the database to determine how much space was available. This idea has several problems, it is clumsy and complicated, and it is not up to date and must be refreshed often. I noticed, however, what while you cannot query v$asm_disk and get accurate results, you can query v$asm_diskgroup and get accurate results from any database.
SELECT
name group_name
, type type
, total_mb total_mb
, (total_mb - free_mb) used_mb
, ROUND((1- (free_mb / total_mb))*100, 2) pct_used
FROM
v$asm_diskgroup
ORDER BY
name;
Disk Group
Name Type Total Size (MB) Used Size (MB) Pct. Used
---------------- ------ --------------- -------------- ---------
DGROUP1 EXTERN 270,328 199,344 73.74
--------------- --------------
Grand Total: 270,328 199,344This query will report the same data from both the ASM instance as well as any databases on the same server. The interesting thing is, even if the database is NOT using ASM, it will report as above since every (10g or 11g) database on a host communicates with ASM via its own “mini-cluster” (ocssd) if not RAC and the RAC cluster if it is.
TOD – Recovering a database past the current controlfile
Posted by Jay Caviness in Backup and Recovery, Tip 'o the day on July 10th, 2009
Unless you have a great many database backups to manage, most DBAs using RMAN use the control file as a catalog instead of a catalog database. This simplifies management, but can make some types of recovery difficult. In a typical non-catalog RMAN recovery, the control file is restored and the instance is started for an RMAN restore and recovery. This method will provide a complete recovery to the point that the control file was backed up.
What happens, however, if that control file is lost? You can recover to the previous day’s backup (if you have it), but subsequent backups of the archive logs and then a full backup will be stored in a new control file backup. This is exactly what happened at a client of mine a few weeks ago. The backup script failed to back up the final archive log after the full backup. Due to Murphy’s Law, that was the exact backup that was needed to restore after a storage issue. Without the final archive log, the database threw the classic error:
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: ‘+DGROUP1/dmlive/datafile/system.294.689876999′
I had no previous copy of the control file except that of the night before, I did, however, have all of the archive logs in an RMAN backup set. But how to tell the control file that they exist? With a catalog database, all the backupsets already recorded, but the control file method is only as good as its last writing by RMAN. There is a method, however.
Using the previous nights control file, use RMAN to restore and recover all database and archivelogs. Do NOT try to open the database. Find the filename of the backupset that contains the archivelogs that are needed, and run the following command withing RMAN:
catalog backuppiece
'/backups/DMLIVE_9497_1_689621187_2009_Mon_full';Once this is complete, restart the recovery using the same control file to roll forward through all the newly cataloged archive logs. Once complete, open the database with the resetlogs option and you are all set.
Happy recoveries!