Archive for July, 2008
New – Scripts page
Posted by Jay Caviness in general on July 31st, 2008
I have been asked to provide some of the scripts I use as a DBA. So I have created a new page “Scripts” which will start off with a few of the ASM scripts I use. I will continually add to the page with scripts for ASM, sql, PL/SQL and shell scripts. If there is a particular function you would like, let me know, if I don’t have it, I am sure I can find it or come up with one. The format may change here and there to make it more efficient. Cheers!
TOD – Rollin’, Rollin’, Rollback that data
Posted by Jay Caviness in Tip 'o the day, general on July 30th, 2008
A long insert, update or delete has just finished, but you have to roll it back. Once you type “rollback” you sit and wait, and wait and wait. There is a query to see how much work is left, it won’t give you the amount of time left, but you can get an ideal of how many rows and blocks are left.
SELECT used_urec, used_ublk FROM v$transaction;
It may take a little work to figure out which transaction is yours if there are several uncommitted transactions at the time, but if it is a big rollback it will be easy to see.
TOD – Trust, but verify that private vlan
Posted by Jay Caviness in RAC, Tip 'o the day on July 25th, 2008
To properly support traffic across the interconnect must be sequestered on a private vlan (non-routeable). What this means is that 1.1.1.1 (commonly used for interconnects) on one node cannot see 1.1.1.1 on a different node/cluster. I have seen, this very week when it is not quite right, the first node could ping/ssh to 1.1.1.2 on the second node, but when the second node tried to ssh back to 1.1.1.1 the login session went to an entirely different server. No wonder the install threw a hissy fit!
One way to check is to run ifconfig (or similar tool on other platforms) for the NIC running the interconnect, there will be a broadcast ip associated with it. Usually something like 1.1.1.255. Take that IP and run the following:
ping -b 1.1.1.255
If the result is similar to:
/home/oracle:(+ASM2)$ ping -b 1.1.1.255 WARNING: pinging broadcast address PING 1.1.1.255 (1.1.1.255) 56(84) bytes of data. 64 bytes from 1.1.1.3: icmp_seq=0 ttl=64 time=0.040 ms 64 bytes from 1.1.1.1: icmp_seq=0 ttl=64 time=0.101 ms (DUP!) 64 bytes from 1.1.1.2: icmp_seq=0 ttl=64 time=0.244 ms (DUP!)
and .1-.3 are nodes in your cluster, then it is a good bet that they are on a private vlan. However (Caveat warning!), it does not guarantee it, it may be the case that that network is currently only used for those three hosts above and that others could be added later. If you see other subnets as well as the interconnect subnet, then you are assured that it is NOT a private vlan.
One final tip, don’t put the interconnect IP/name in DNS, it should only be able to be seen from the hosts involved in the cluster.
TOD – Moving a VIP in an emergency
Posted by Jay Caviness in Tip 'o the day on July 24th, 2008
RAC is generally very stable, but not always, those of us that have been in this business any length of time have seen many of the “quirks” or as Oracle calls them “features” that can cause a cluster to behave in bizarre fashion. One that I have run into more than once (though still rarely) is a case where the VIP does not fail over on it’s own after a node is lost or down. If this happens, I guarantee your phone will start ringing. Read the rest of this entry »
Did you ever have one of those projects?
Posted by Jay Caviness in general on July 22nd, 2008
I have been working the last several months on a project to setup an HP blade enclosure, ISCSI, DM multipathing, Grid Control, OAS and RAC on VM. Each step as been a new adventure. Now, don’t get me wrong I love a challenge and learning new things, but sometimes, well, it can drive a person a little nuts. Instead of trying to explain it all in words, I thought a 47 second video presentation would be easer.
Parameter, what parameter?
Posted by Jay Caviness in RAC on July 16th, 2008
Interesting problem encountered recently at a client’s site. They have a four-node linux rac cluster running 10.2.0.3 and were experiencing instance evictions due to running out of shared memory now and again. While it was odd enough that at times the 4030 error with actually kill an instance and force the cluster to restart it, the real problem was why were they getting 4030’s in the first place? Their SGA_TARGET was set to 2800m and SGA_MAX_SIZE was set to 3G. How could it all be used up? Well my friend, the truth is stranger than fiction.
It didn’t come clear until the spfile was actually dumped for review. The SGA_TARGET was set in two ways:
live1.sga_target=1800m live2.sga_target=1800m live3.sga_target=1800m live4.sga_target=1800m *.sga_target=2800m
Well, if you did a show parameter sga_target you got 2800M, but since they were set instance specific, they were only really getting 1800m which was not enough for their applications to use. Very odd, it isn’t listed as a bug, but it never would have been found until the spfile was created. So watch out for those instance specific parameters if you are also using a database-wide parameter as well!
TOD – TNS-12560 error
Posted by Jay Caviness in Tip 'o the day on July 10th, 2008
Don’t you just hate it when you are all set to go and your client (or database) can’t quite make that connection to the database? There can be many causes to the TNS-12560 error, but one that is the most common.
12560, 00000, “TNS:protocol adapter error”
// *Cause: A generic protocol adapter error occurred.
Most often it is due to the port being used for the sqlnet connection not being available. You can test it with telnet:
telnet joeserver 1521 will simply hang or return an error
If you can verify that the port is not in use by anything else, your firewall may be running, you can shut it off in linux with the command (as root): /etc/init.d/iptables stop
TOD – When will that backup/recovery finish?
Posted by Jay Caviness in Tip 'o the day on July 8th, 2008
If you have ever had to recover a production database you know what pressure is. The worst of it is generally some member of management coming in every three minutes to find out when it will be done. Well, with this quick query you can get a rough estimation of when a recovery (or backup) done with RMAN will finish:
set lines 132
col message for a100
SELECT time_remaining, message
FROM v$session_longops
WHERE substr(opname,1,4)='RMAN' and
time_remaining > 0;The query returns time remaining in seconds, generally (if default parallelism is set to 1) you will see two rows, one for the overall backup/recovery and one for the current backup set being recovered. If parallelism > 0 then you will see that number of rows for each backup set being recovered by RMAN processes.
Warning: Take the number with a grain of salt, many things can affect it, but if you boss wants to know you can always take the time_remaining and multiply by 3 like Mr. Scott. That way, me laddy, you will look like a miracle worker.
Are You Certifiable?
Posted by Jay Caviness in general on July 7th, 2008
With nearly 20 years as an Oracle DBA I have been asked many times what my opinion of Oracle Certified Professional certifications. Well, I will tell you, not a lot. Let me backtrack a little bit first, however. Back in 1997 I was working for Oracle Support. I was one of the few that was a DBA before I worked for Oracle support with over seven years of experience. Oracle had just released the first certification for Oracle 7.3 and we were offered the opportunity to take all four tests for free (General DBA, sql & pl/sql, backup/recovery and tuning). I thought, “why not” and took and passed all four tests, the biggest problem I faced was over thinking a question. I became the 21st certified Oracle DBA and got a nice marble pyramid and leather jacket from Oracle.
All that is well and good, but after that I never took any of the upgrade tests or new certifications. One simple reason, they don’t mean a whole lot if you are an experienced DBA. What a certification tells me is that you can read a book and perform the basic, rote tasks a DBA is expected to perform, but it doesn’t tell me if someone really knows what being a DBA is really all about. I have performed my share of interviews, both as and employee and pre-screening for various companies, and I can tell you that a certification does not mean much if they cannot solve real world problems. I want to know if a candidate understands why things work the way they do, not just how they work. I look for more of an engineer than a DBA, that can be bolted on the top.
I have interviewed too many people that can answer the standard questions: what is the SGA? where would I find the alert log? What is the minimum number redo logs to start a database? But, when asked something like, “what is the most insidious way you can think of to destroy a database? Ok, now how do you fix it?” or “How do you handle a developer whose SQL is very bad?” Book learnin’ an OCP would not give you those answers.
All in all, I want someone who is book learned, but also has an imagination, an OCP alone does not show me that.
TOD – Running netca silently
Posted by Jay Caviness in Tip 'o the day on July 7th, 2008
What do you do when you find yourself on a terminal that does not support X-windows and you need to run netca? If you are not running a RAC database this is not really a problem as you can build a listener.ora manually (you really don’t even need that with auto registration). However, on RAC, netca is the only supported method to add the listeners to the cluster.
You can run netca silently and it is very simple, first locate the netca.rsp file which is generally on your installation software under response files. If you don’t have it handy, I will put it at the bottom of this post (without comments):
To run netca silently set your $ORACLE_HOME appropriately and run the following command:
netca /silent /nodeinfo node1,node2...nodeX
/responsefile /(location of netca.rsp)example:
netca /silent /nodeinfo shiner,tecate,beartrap
/responsefile /home/oracle/netca.rspYou will only need to modify netca.rsp if you are using a non-standard default port (not 1521)
netca.rsp < [GENERAL]
RESPONSEFILE_VERSION="10.0"
CREATE_TYPE= "CUSTOM"
[oracle.net.ca]
INSTALLED_COMPONENTS={"server","net8","javavm"}
INSTALL_TYPE=""typical""
LISTENER_NUMBER=1
LISTENER_NAMES={"LISTENER"}
LISTENER_PROTOCOLS={"TCP;1521"}
LISTENER_START=""LISTENER""
NSN_NUMBER=1
NSN_NAMES={"EXTPROC_CONNECTION_DATA"}
NSN_SERVICE = {"PLSExtProc"}
NSN_PROTOCOLS={"TCP;HOSTNAME;1521"}