Archive for category RAC

11gR2 new features: SCAN

One really nice new feature of 11gR2 is the SCAN (single client access name). A scan is the single point of access for all applications connecting to an 11gR2 RAC cluster and allows consistent connections without the need to know how many nodes are in a cluster. Vips are still used internally, and can still be used for connections, but the initial connection to the cluster is made via a scan. Connections to any database in a cluster will be made via the scan. No longer will a DBA need to create those large, complicated tnsnames.ora or jdbc (thin) connection strings. All can be accessed by a scan:

sqlplus joeuser/joespasswd@dodge-scan:1521/proddb
or
jdbc:oracle:thin:@dodge-scan:1521/proddb

One scan is needed for each cluster, it is a single DNS entry with three IP addresses attached to the name and set to round-robin (unless using GNS, but that is another post). The IP addresses must be unused (similar to a VIP). A good naming technique would be to name the scan after the cluster it is created for. For example, one of the clusters I use regularly contains three nodes, caravan, stratus and durango, affectionately called the Dodge cluster thus the name of the scan to be used. Once the networking folks have created the DNS entry it is ready to use. The scan is created in the cluster at installation. Note that VIPs are still needed and must still reside in DNS.

Anatomy of a SCAN
How does it all work? A new set of cluster processes called scan listeners will run on three nodes in a cluster. If you have more than three nodes, that is ok, regardless of the number of nodes you have, there will be at most three scan listeners. If any of these clustered processes fail, they are automatically restarted on a new node. When a new connection request is made, the request hits DNS on the scan name, DNS will round robin and choose one of the three IP addresses assigned to that scan name and route it to the correct scan listener. Each scan listener keeps updated cluster load statistics and will then route the request to the appropriate VIP IP which is returned to the requesting service. The requesting service will then connect directly to the returned VIP as in previous versions and the connection is made through a standard listener.

Can connections still be made directly to through the VIPs? Yes, connections via the virtual IP addresses are still supported, though I cannot see a reason to use the old method as yet. Will RAC services work with the scan? Yes, there is no difference in that functionality. SCANs are a layer on top of the old method of connecting to a RAC database and do not interfere with any current methods.

The SCAN provides a simple and effective mean to connect to any and all instances in a cluster without the need to hard code connection information which can change. This has been needed for years and is one of the best new features of 11gR2 to prevent headaches for DBAs in the future.

, , ,

6 Comments

TOD – Most cluster issues

As a DBA we have heard the old axiom that 80-90% of database performance issues are query related. I have a similar axiom about Oracle RAC: 90% of all cluster startup issues are either disk (voting/ocr) or interconnect related.

Today I forgot the second part of that axiom when I could not get two nodes of a three node cluster started. I ignored the interconnect part because I checked it first with ifconfig and the NIC was up on all three nodes. Secondly, because the error I would get in the ocssd.log file went on and on about:

clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(4715) LATS(1135926) Disk lastSeqNo(4715)

incrementing and rapidly enlarging the log. After changing several settings on the multipathing and in /etc/udev/rules.d, I tried the old test:

ping -b 1.1.1.255

That is, perform a broadcast ping on the full range of the interconnect. Nodes 2 and 3 could see each other and node 1 could see only itself. Once I fixed the private VLAN issue, all was well and the cluster came straight up.

The moral of this story is that just because it walks like a disk problem, talks like a disk problem and acts like a disk problem, in clusterware, it might just be a network issue.

No Comments

Why crossover cables are not supported in RAC

Many Oracle shops in this world use crossover cables, literally a network cable, between nodes for use as the interconnect between two rac nodes. Does this work, yep, you bet. Is it supported, no. Why? well it all has to do with how a node reacts when its sister fails in a two node cluster.

Each node in the cluster constantly checks on the other nodes in the cluster through both the network (interconnect) and storage (voting disks), if one or both are lost, the cluster node is instructed to commit suicide and reboot itself in hopes of rejoining the cluster healthy and happy.

If a crossover cable is used, and one of the nodes drops the remaining node will have to wait for the tcp timout, generally 60-300  seconds, before it realized that the lost node is gone.   At which point, the cluster will remove the lost node from the cluster.   What can happen during  that time is two fold, the surviving node can lock up, litterally freeze during the wait for the timeout and/or the cluster can become very confused if the dead node restarts and attempts to join the cluster at a point when the cluster still thinks it is there.   Strange things have been known to happen, many errors thrown and at times will cause both nodes to evict and restart.

Having a switch between the nodes allows a signal to be sent immediately if a node quits responding, at which time the surviving node will check for 60 seconds then evict the failing node, allowing it to rejoin (upon reboot) a clean cluster without any problems.

In short, crossover cables are fine in an emergency or development, any situation where failover is not critical, but for production, spend the money on a good switch, two in fact if you can bond your nics (that’s for another post), for the best senario to survive a failover with as few issues as possible.

No Comments

TOD – olsnodes hangs temporarily

In using rac, I have found it handy to modify oraenv to use the olsnodes -n -l command to find the local node number and append it to the database name for the real sid.

/home/oracle:()$ . oraenv
ORACLE_SID = [oracle] ? qa
/home/oracle:(qa1)$ echo $ORACLE_SID
qa1

I find this easier than manually setting the sid as it is consistent on each node. One issue I have run into is that after a node has been around for a while there can be a lag time in using oraenv to set the sid.

Tracing the problem, I found it would hang for several seconds on the olsnodes command. Minor, but annoying, especially when in a hurry. In trying to find the cause I discovered that olsnodes would write a logfile to the $CRS_HOME/log//client directory. If there are a large number of log files (css*.log) it will slow down as unix has to create new inode for the new file and unix has to take the number of files in a directory into account when allocating a new inode.

The obvious solution is to remove these files, in most cases an “rm” command will not work as the file list is too long so a find command would be used:

 find . -name "*.log" -exec rm -f {} \;

The best resolution would be to create a cron job that would remove all old logs:

 00 03 * * * /usr/bin/find /oracle/product/10.2.0/crs_1/log/lx52/client \( -name "css*.log" -o -name "*.trc" \) -mtime +1 -exec /bin/rm -f {} \;

In the example above at 3am we find all the css*.log and *.trc files older than midnight yesterday and remove them. Based on running at 3am, it would remove all files over 27 hours old.

No Comments

TOD – Trust, but verify that private vlan

To properly support traffic across the interconnect must be sequestered on a private vlan (non-routeable). What this means is that 1.1.1.1 (commonly used for interconnects) on one node cannot see 1.1.1.1 on a different node/cluster. I have seen, this very week when it is not quite right, the first node could ping/ssh to 1.1.1.2 on the second node, but when the second node tried to ssh back to 1.1.1.1 the login session went to an entirely different server. No wonder the install threw a hissy fit!

One way to check is to run ifconfig (or similar tool on other platforms) for the NIC running the interconnect, there will be a broadcast ip associated with it. Usually something like 1.1.1.255. Take that IP and run the following:

ping -b 1.1.1.255

If the result is similar to:

/home/oracle:(+ASM2)$ ping -b 1.1.1.255
WARNING: pinging broadcast address
PING 1.1.1.255 (1.1.1.255) 56(84) bytes of data.
64 bytes from 1.1.1.3: icmp_seq=0 ttl=64 time=0.040 ms
64 bytes from 1.1.1.1: icmp_seq=0 ttl=64 time=0.101 ms (DUP!)
64 bytes from 1.1.1.2: icmp_seq=0 ttl=64 time=0.244 ms (DUP!)

and .1-.3 are nodes in your cluster, then it is a good bet that they are on a private vlan. However (Caveat warning!), it does not guarantee it, it may be the case that that network is currently only used for those three hosts above and that others could be added later. If you see other subnets as well as the interconnect subnet, then you are assured that it is NOT a private vlan.

One final tip, don’t put the interconnect IP/name in DNS, it should only be able to be seen from the hosts involved in the cluster.

, ,

No Comments

Parameter, what parameter?

Interesting problem encountered recently at a client’s site. They have a four-node linux rac cluster running 10.2.0.3 and were experiencing instance evictions due to running out of shared memory now and again. While it was odd enough that at times the 4030 error with actually kill an instance and force the cluster to restart it, the real problem was why were they getting 4030’s in the first place? Their SGA_TARGET was set to 2800m and SGA_MAX_SIZE was set to 3G. How could it all be used up? Well my friend, the truth is stranger than fiction.

It didn’t come clear until the spfile was actually dumped for review. The SGA_TARGET was set in two ways:

live1.sga_target=1800m
live2.sga_target=1800m
live3.sga_target=1800m
live4.sga_target=1800m
*.sga_target=2800m

Well, if you did a show parameter sga_target you got 2800M, but since they were set instance specific, they were only really getting 1800m which was not enough for their applications to use. Very odd, it isn’t listed as a bug, but it never would have been found until the spfile was created. So watch out for those instance specific parameters if you are also using a database-wide parameter as well!

No Comments

TOD – Interconnect determination query

Ever not quite sure if your interconnect is on a private vlan? Below is a simple query to be sure:

This example is from an AIX box, but it works for all platforms.

SELECT INST_ID,
       PUB_KSXPIA,
       PICKED_KSXPIA,
       NAME_KSXPIA,
       IP_KSXPIA f
  FROM x$ksxpia;
 
INST_ID      P   PICK  NAME_KSXPIA         IP_KSXPIA
------------ -   ----  ------------------- ---------------------
3            Y   OCR   en4                 10.11.12.13

If P (pub_ksxpia) is Y then it is a public network, not a private vlan and you should have a chat with your network admin.

DESC x$ksxpia
Name                         NULL?         Type
ADDR                                       RAW(8)
INDX                                       NUMBER
INST_ID                                    NUMBER
PUB_KSXPIA                                 VARCHAR2(1)
PICKED_KSXPIA                              VARCHAR2(4)
NAME_KSXPIA                                VARCHAR2(15)
IP_KSXPIA                                  VARCHAR2(16)

No Comments