TheBonsai's Blog

About the days and nights of TheBonsai

Archive for the 'Oracle' Category

10gR2 (10.2.0.5) on top of existing 11gR2 (11.2.0.3) GI

February 5th, 2012 by TheBonsai

Hello Oracle fans and victims out there,

I tried to create a 10.2.0.5 database on top of an existing 11.2.0.3 GI (with an already fine running 11.2.0.3 RDBMS as second database), 2 node RAC.

Installed the 10.2.0.1 + 10.2.0.5 PS + 10.2.0.5.6 PSU… worked like a charm.

After pinning the nodes (remember, you have to do that for a pre-11gR2 database on 11gR2 Clusterware!), I created the RAC database with DBCA… worked like a charm.

I started up the database with srvctl – crash. Both instances were killed by their LMON because of a KGXGN polling error. It looks like this in the alert logs:

Instance #1:

Sat Feb 04 12:44:20 CET 2012
lmon registered with NM – instance id 1 (internal mem no 0)
Sat Feb 04 12:46:50 CET 2012
oracle@svdbslx060 (LMON) (ospid: 28370) detects hung instances during IMR reconfiguration
oracle@svdbslx060 (LMON) (ospid: 28370) tries to kill the instance 2.
Please check instance 2’s alert log and LMON trace file for more details.
Sat Feb 04 12:48:05 CET 2012
Remote instance kill is issued with system inc 0 and reason 0x20000000
Remote instance kill map (size 1) : 2
Sat Feb 04 12:49:20 CET 2012
Error: KGXGN polling error (15)
Sat Feb 04 12:49:20 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys1_lmon_28370.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
LMON: terminating instance due to error 29702
Sat Feb 04 12:49:20 CET 2012
System state dump is made for local instance
Sat Feb 04 12:49:20 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys1_diag_28366.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
Sat Feb 04 12:49:20 CET 2012
Trace dumping is performing id=[cdmp_20120204124920]
Sat Feb 04 12:49:20 CET 2012
Instance terminated by LMON, pid = 28370

Instance #2:

Sat Feb 04 12:44:20 CET 2012
lmon registered with NM – instance id 2 (internal mem no 1)
Sat Feb 04 12:49:20 CET 2012
Error: KGXGN polling error (15)
Sat Feb 04 12:49:20 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys2_lmon_1695.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
LMON: terminating instance due to error 29702
Sat Feb 04 12:49:20 CET 2012
Trace dumping is performing id=[cdmp_20120204124920]
System state dump is made for local instance
Sat Feb 04 12:49:21 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys2_diag_1691.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
Sat Feb 04 12:49:21 CET 2012
Trace dumping is performing id=[cdmp_20120204124921]
Sat Feb 04 12:49:21 CET 2012
Instance terminated by LMON, pid = 1695

The LMON trace files just revealed the same information for me. I installed the 10.2.0.5.2 CRS Bundle into this ORACLE_HOME – no change. The internet – including MOS – gave hints in many directions, but nothing really seemed to match.

Finally, after a day off (you sometimes need distance!) I got it:

The error messages highly indicate an interconnect problem. The fact that a second (11gR2) database works fine at the same time excludes physics and other problems on the base of the stack. The server and the 11gR2 world has 2 interconnect interfaces. Solution: I just burned the IP of one specific interconnect interface into the init.ora parameters of the 10gR2 instance – and we got a take off. It works like a charm.

Category: Oracle, Work | 2 Comments »

SQLNET.RECV_TIMEOUT/SEND_TIMEOUT and RMAN

November 2nd, 2011 by TheBonsai

Hi there,

I was analyzing some unexpected RMAN termination, a RMAN-10038/RMAN-03009 combo:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: Fehler bei REFAF Befehl in c13 Kanal auf 10/31/2011 08:21:50
RMAN-10038: Datenbank-Session für Kanal c13 unerwartet beendet

Nothing, not even an RMAN tracing was able to reveal more hints. The trace just told it in other words. No underlying ORA/TNS error or similar.

It finally turned out it were some parameters recently added to sqlnet.ora, I set SQLNET.RECV_TIMEOUT and SQLNET.SEND_TIMEOUT and the beast silently dropped RMAN channels that were idle for a while. The next command in this channel blew up the whole RUN block of the backup script.

I removed the parameters and it works again.

Be careful with your sqlnet.ora :-)

 

Category: english, Oracle, Work | No Comments »

Escaping special characters in SQL*Plus logon strings

March 24th, 2011 by TheBonsai

SQL*Plus connect strings/logon strings have a couple of special characters, notably these two:

  • / (slash) to separate username and password
  • @ (at) to separate the TNS descriptor string

If you need to use those characters literally in the logon string, you need to tag them with literal double quotes (literal means: the quotes need to be passed to SQL*Plus, I’m not talking about the UNIX shell):

  • Less readable:
    $ sqlplus USER/\"PASS/WORD\"
  • More readable:
    $ sqlplus USER/'"PASS/WORD"'

Category: english, Oracle | 1 Comment »