Archive for category Tip 'o the day

TOD – Keep Listening, Stop Logging

Every so often I get a call from a client because their database just up and stopped.  Generally that means only one of two things, either the LOG_ARCHIVE_DEST is full and the database(s) can no longer write out their deltas or the filesystem which contains the $ORACLE_HOME for said database is full.   In either case  Oracle will quiesce each database hosted from that home and no changes except startup and shutdown may be made to said database.

When the latter happens,   the first place I look is for log files, but not just any log files.  The alert log can grow rapidly, but it doesnt compare to the listener.log (or listener_<hostname>.log).  Since this logfile is a record of every SQL*Net connection to the database, on busy systems it can and does fill up rapidly.  Most systems no longer have a 2GB limit, thus the log can reach huge sizes and fill a filesystem before you know it.

What can be done?  You can rotate the log in some fashion or clear it (>listener.log will clear a file without removing it).  If deleted or moved it will recreate itself on the next write.  But what if you don’t care about logging all that information?  If it isn’t needed for auditing or tracing, you can shut it off by adding the following line to your listener.ora file:

LOGGING_LISTENER = OFF
or
LOGGING_LISTENER_<hostname>=OFF (if RAC or named listeners are used)

and then reload the listener, no need for a restart here.  At once logging will stop and all the free space on your filesystem will celebrate the fact that it will not be soon gobbled up by a rapidly growing listener.log.  If you determine that it is needed, you can remove the line (the default is to log) or set the value to ON and reload the listener.

2 Comments

TOD – ASM diskgroup space from any database

Have you ever noticed that v$asm views are available in all databases? This is true whether using ASM or not because the ASM home and software is the same as the database software. On this note, while the DB and ASM can share a home, they should not in order to preserve the ability to upgrade ASM separately from the database home. Simply, ASM should always be at or above the highest database software version, but that is another post.

Ok, now that I am back from that tangent. I have had users call me in a panic when they run ASM scripts in a database instead of the ASM instance. When querying v$asm_disk and checking on the physical disk/luns, it will always report 100% used:

  SELECT
      NVL(a.name, '[CANDIDATE]')                       disk_group_name
    , b.path                                           disk_file_path
    , b.total_mb                                       total_mb
    , (b.total_mb - b.free_mb)                         used_mb
    , ROUND((1- (b.free_mb / b.total_mb))*100, 2)      pct_used
  FROM
      v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
  ORDER BY
     a.name;
Disk Group Name Path              File Size (MB) Used Size (MB) Pct. Used
--------------- ----------------- -------------- -------------- ---------
DGROUP1         ORCL:VOL001               33,791         33,791    100.00
                ORCL:VOL002               33,791         33,791    100.00
                ORCL:VOL003               33,791         33,791    100.00
                ORCL:VOL008               33,791         33,791    100.00
                ORCL:VOL005               33,791         33,791    100.00
                ORCL:VOL006               33,791         33,791    100.00
                ORCL:VOL007               33,791         33,791    100.00
                ORCL:VOL004               33,791         33,791    100.00
***************                   -------------- --------------
                                         270,328        270,328
 
                                  -------------- --------------
Grand Total:                             270,328        270,328

Which can induce great panic as it appears that ASM is full, but when properly run from ASM:

Disk Group Name Path              File Size (MB) Used Size (MB) Pct. Used
--------------- ----------------- -------------- -------------- ---------
DGROUP1         ORCL:VOL001               33,791         24,925     73.76
                ORCL:VOL002               33,791         24,911     73.72
                ORCL:VOL003               33,791         24,910     73.72
                ORCL:VOL008               33,791         24,921     73.75
                ORCL:VOL005               33,791         24,921     73.75
                ORCL:VOL006               33,791         24,905     73.70
                ORCL:VOL007               33,791         24,932     73.78
                ORCL:VOL004               33,791         24,919     73.74
***************                   -------------- --------------
                                         270,328        199,344
 
                                  -------------- --------------
Grand Total:                             270,328        199,344

Some users, including my own developers, need to know how much space is available on the diskgroup from within their own database. It was thought that we had to query asm via SQL*net or have ASM populate a file to be brought into the database to determine how much space was available. This idea has several problems, it is clumsy and complicated, and it is not up to date and must be refreshed often. I noticed, however, what while you cannot query v$asm_disk and get accurate results, you can query v$asm_diskgroup and get accurate results from any database.

  SELECT
      name                                     group_name
    , type                                     type
    , total_mb                                 total_mb
    , (total_mb - free_mb)                     used_mb
    , ROUND((1- (free_mb / total_mb))*100, 2)  pct_used
  FROM
      v$asm_diskgroup
  ORDER BY
     name;
 
Disk Group
Name             Type   Total Size (MB) Used Size (MB) Pct. Used
---------------- ------ --------------- -------------- ---------
DGROUP1          EXTERN         270,328        199,344     73.74
                        --------------- --------------
Grand Total:                    270,328        199,344

This query will report the same data from both the ASM instance as well as any databases on the same server. The interesting thing is, even if the database is NOT using ASM, it will report as above since every (10g or 11g) database on a host communicates with ASM via its own “mini-cluster” (ocssd) if not RAC and the RAC cluster if it is.

No Comments

TOD – Recovering a database past the current controlfile

Unless you have a great many database backups to manage, most DBAs using RMAN use the control file as a catalog instead of a catalog database. This simplifies management, but can make some types of recovery difficult. In a typical non-catalog RMAN recovery, the control file is restored and the instance is started for an RMAN restore and recovery. This method will provide a complete recovery to the point that the control file was backed up.

What happens, however, if that control file is lost? You can recover to the previous day’s backup (if you have it), but subsequent backups of the archive logs and then a full backup will be stored in a new control file backup. This is exactly what happened at a client of mine a few weeks ago. The backup script failed to back up the final archive log after the full backup. Due to Murphy’s Law, that was the exact backup that was needed to restore after a storage issue. Without the final archive log, the database threw the classic error:

ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: ‘+DGROUP1/dmlive/datafile/system.294.689876999′

I had no previous copy of the control file except that of the night before, I did, however, have all of the archive logs in an RMAN backup set. But how to tell the control file that they exist? With a catalog database, all the backupsets already recorded, but the control file method is only as good as its last writing by RMAN. There is a method, however.

Using the previous nights control file, use RMAN to restore and recover all database and archivelogs. Do NOT try to open the database. Find the filename of the backupset that contains the archivelogs that are needed, and run the following command withing RMAN:

catalog backuppiece 
       '/backups/DMLIVE_9497_1_689621187_2009_Mon_full';

Once this is complete, restart the recovery using the same control file to roll forward through all the newly cataloged archive logs. Once complete, open the database with the resetlogs option and you are all set.

Happy recoveries!

No Comments

TOD – Most cluster issues

As a DBA we have heard the old axiom that 80-90% of database performance issues are query related. I have a similar axiom about Oracle RAC: 90% of all cluster startup issues are either disk (voting/ocr) or interconnect related.

Today I forgot the second part of that axiom when I could not get two nodes of a three node cluster started. I ignored the interconnect part because I checked it first with ifconfig and the NIC was up on all three nodes. Secondly, because the error I would get in the ocssd.log file went on and on about:

clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(4715) LATS(1135926) Disk lastSeqNo(4715)

incrementing and rapidly enlarging the log. After changing several settings on the multipathing and in /etc/udev/rules.d, I tried the old test:

ping -b 1.1.1.255

That is, perform a broadcast ping on the full range of the interconnect. Nodes 2 and 3 could see each other and node 1 could see only itself. Once I fixed the private VLAN issue, all was well and the cluster came straight up.

The moral of this story is that just because it walks like a disk problem, talks like a disk problem and acts like a disk problem, in clusterware, it might just be a network issue.

No Comments

TOD – Coming in from the cold – Bringing datafiles into ASM

I received a panicked call the other day from a DBA who had a user create a new datafile on a cooked filesystem. All was not lost and it didn’t require a downtime to fix, however, it did require that tablespace to be taken offline for a time.

In all databases I create using ASM I set the parameter: DB_FILE_CREATE_DEST (and it’s cohort DB_CREATE_ONLINE_LOG_DEST_X, which I will discuss in the future) and set it to the value of the default diskgroup (in our case +DGROUP1). This makes life much easier for a dba as the file parameters are no longer needed:

 SQL> create tablespace asm_demo datafile size 20m;
 
Tablespace created.
 
SQL> select file_name from dba_data_files
  2  where tablespace_name = 'ASM_DEMO';
 
FILE_NAME
------------------------------------------------------------
+DGROUP1/qa/datafile/asm_demo.974.671209339

In the above example, if the db_create_file_dest parameter is set, the only information I need to create a tablespace is the tablespace name. Technically, even the size is not needed as a default size of 10m is used if the size is omitted.

This is all well and good, but what happens if someone comes along and creates a file the old fashioned way:

SQL> create tablespace test datafile '/home/oracle/test_01.dbf' 
  2> size 10m;

How do you get this file into asm? Elementary my dear Watson! First take the tablespace offline.

 SQL>  alter tablespace test offline;

Start RMAN and backup a copy of the datafile into ASM, since the tablespace is offline, no transactions will change the datafile. (I will leave out the extraneous rman info)

 RMAN> backup as copy tablespace 'TEST' format '+DGROUP1';

Tell the data dictionary to use the new copy.

 RMAN> switch tablespace 'TEST' to copy;
datafile 50 switched to datafile copy 
    "+DGROUP1/qa/datafile/test.973.671207181"

Finally, bring the tablespace back online, either from rman or sqlplus, and check where the datafile is no located.

SQL> alter tablespace test online;
 
Tablespace altered.
 
SQL> select file_name from dba_data_files 
  2> where tablespace_name = 'TEST';
 
FILE_NAME
-------------------------------------------------------
+DGROUP1/qa/datafile/test.973.671206913

That is it, now, admittedly taking an active tablespace offline can be a challenge and may require a downtime based on your application. But, the movement of the file into ASM is quite simple. In the end the DBA is happy and the user that created the cooked datafile was educated and given 50 lashes with a wet noodle.

No Comments

TOD – olsnodes hangs temporarily

In using rac, I have found it handy to modify oraenv to use the olsnodes -n -l command to find the local node number and append it to the database name for the real sid.

/home/oracle:()$ . oraenv
ORACLE_SID = [oracle] ? qa
/home/oracle:(qa1)$ echo $ORACLE_SID
qa1

I find this easier than manually setting the sid as it is consistent on each node. One issue I have run into is that after a node has been around for a while there can be a lag time in using oraenv to set the sid.

Tracing the problem, I found it would hang for several seconds on the olsnodes command. Minor, but annoying, especially when in a hurry. In trying to find the cause I discovered that olsnodes would write a logfile to the $CRS_HOME/log//client directory. If there are a large number of log files (css*.log) it will slow down as unix has to create new inode for the new file and unix has to take the number of files in a directory into account when allocating a new inode.

The obvious solution is to remove these files, in most cases an “rm” command will not work as the file list is too long so a find command would be used:

 find . -name "*.log" -exec rm -f {} \;

The best resolution would be to create a cron job that would remove all old logs:

 00 03 * * * /usr/bin/find /oracle/product/10.2.0/crs_1/log/lx52/client \( -name "css*.log" -o -name "*.trc" \) -mtime +1 -exec /bin/rm -f {} \;

In the example above at 3am we find all the css*.log and *.trc files older than midnight yesterday and remove them. Based on running at 3am, it would remove all files over 27 hours old.

No Comments

TOD – Installing Oracle fails citing “Not enough space”

I perform a great many installations of Oracle both RAC and non-rac, in fact, I have installed RAC on over 200 clusters now.  In the process of this I have created silent install processes to help make a cluster install easier on everyone involved.   From time to time, however, the silent install (and sometimes an interactive install) fails with the error “You may not have enough space on your drive for a successful install. Do you still want to continue?”  This is all well and good if interactive as you can say yes or no, but in a silent install all is lost and you will have to start over.

But what causes this error? The obvious is not enough space in the filesystem on which you are installing the Oracle software, but not always.  Oracle typically espects anywhere from 1.6 to 3.5G free on that filesystem depending on product and platform.  You may run into a case where there is plenty of room, so why did it fail?  What the error doesn’t tell you is where the installer is looking for that space.  It is not only your installation (Oracle_base) filesystem, but also /var and /tmp.  If there have been several other installs, successfull or not, Oracle uses /tmp and /var/tmp to store installer information in directories in the format “OraInstallYYYY-MM-DD_HH-MM-SS{AM|PM}.  These directories are not automatically removed at the end of installations and not if an install fails.   Oracle requires typically about 400M in these filesystems, if not found, it will fail with the error above.  So remember to clean out /var/tmp and /tmp after your installations!

,

No Comments

TOD – libawt.so : libXp.so.6 error for different executables

I have seen errors like:

Exception in thread “main” java.lang.UnsatisfiedLinkError: /oracle/product/10.2.0/asm_1/jdk/jre/lib/i386/libawt.so: libXp.so.6: cannot open shared object file: No such file or directory

happen on new linux installs (OEL5 and RHEL5) for srvctl, netca and several other commands, the problem is that several Oracle executables are using X library calls that are now deprecated in the RHEL/OEL5 versions of Linux.   The simplest fix is to add the xorg-x11-deprecated-libs-6.8.2-1.EL.33.i386.rpm with the following command (as root):

rpm -Uvh xorg-x11-deprecated-libs-6.8.2-1.EL.33.i386.rpm

It is the same rpm for 64 and 32bit O/S installations.   Once applied, the errors go away and the application should run normally.

, , , ,

No Comments

TOD – Compressing exports on the fly

There is a tip on compressing those wily exports, when there isn’t enough space to store an uncompressed export. I have been doing this since I got off of VMS and on to unix back at V7 and it works fine all the way through 11g exp and datapump.

Perform all of these commands as the oracle user.
1. create a named pipe in unix:

 $ mknod /tmp/pp p

2. Send the output of a compression command (gzip, compress, etc) to the pipe in the background:

$ gzip < /tmp/pp > /backup/full_exp.dmp.gz &
or
$ compress < /tmp/pp > /backup/full_exp.dmp.Z &

3. Send the output of your export command to the named pipe:

 $ exp sys/joepassword file=/tmp/pp full=y ...

That is all there is to it, to import from the same compressed file:
1. make the pipe again if not already there
2. start up a background process to run the uncompress:

$ gunzip < /backup/full_exp.dmp.gz > /tmp/pp &

3. finally import as you normally would:

 $ imp sys/joepassword file=/tmp/pp full=y ...

You can always uncompress the files manually then import if you wish.

, , , ,

No Comments

TOD – How to check if a LUN is used by ASM

ASM is down, init+ASM.ora is gone and you need to see which luns are owned by ASM, if on linux you can generally use the “service oracleasm” command to find the luns that have been marked for use with ASM.  On other platforms the oracleasm service is not available, how then to check?   You can use the od command to check the first few lines of a lun, it will display “ORCLDISK” and the diskgroup in the first five lines if part of and ASM diskgroup:

/dev:()$ od -c /dev/hdiskpower10 | more         
0000000   \0 202 001 001  \0  \0  \0  \0 200  \0  \0  \0 211 341   $ 207
0000020   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000040   O   R   C   L   D   I   S   K  \0  \0  \0  \0  \0  \0  \0  \0
0000060   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000100   \n 020  \0  \0  \0  \0 001 003 D   G   R   O   U   P   1  _
0000120    0   0   0   0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000140   \0  \0  \0  \0  \0  \0  \0  \0   D   G   R   O   U   P   1  \0
0000160   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000200   \0  \0  \0  \0  \0  \0  \0  \0   D   G   R   O   U   P   1   _
0000220    0   0   0   0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000240   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

You can see above that this lun is part of diskgroup dgroup1.

No Comments