TheBonsai's Blog

About the days and nights of TheBonsai

Archive for the 'Work' Category

20 years of *NIX stuff

May 8th, 2016 by TheBonsai

Heyah,

yesterday I celebrated my 36th birthday (yes, I really got old..!).

What I suddenly realized was that I also can look back on 20 years of Linux and UNIX passion. I don’t know the exact day, so I just declared it to be on my birthday!

So, also celebrating 20 years of personal *NIX passion!

Happy anniversary Jan and *NIX 😉

Category: english, Linux, Technology, Work | No Comments »

10gR2 (10.2.0.5) on top of existing 11gR2 (11.2.0.3) GI

February 5th, 2012 by TheBonsai

Hello Oracle fans and victims out there,

I tried to create a 10.2.0.5 database on top of an existing 11.2.0.3 GI (with an already fine running 11.2.0.3 RDBMS as second database), 2 node RAC.

Installed the 10.2.0.1 + 10.2.0.5 PS + 10.2.0.5.6 PSU… worked like a charm.

After pinning the nodes (remember, you have to do that for a pre-11gR2 database on 11gR2 Clusterware!), I created the RAC database with DBCA… worked like a charm.

I started up the database with srvctl – crash. Both instances were killed by their LMON because of a KGXGN polling error. It looks like this in the alert logs:

Instance #1:

Sat Feb 04 12:44:20 CET 2012
lmon registered with NM – instance id 1 (internal mem no 0)
Sat Feb 04 12:46:50 CET 2012
oracle@svdbslx060 (LMON) (ospid: 28370) detects hung instances during IMR reconfiguration
oracle@svdbslx060 (LMON) (ospid: 28370) tries to kill the instance 2.
Please check instance 2’s alert log and LMON trace file for more details.
Sat Feb 04 12:48:05 CET 2012
Remote instance kill is issued with system inc 0 and reason 0x20000000
Remote instance kill map (size 1) : 2
Sat Feb 04 12:49:20 CET 2012
Error: KGXGN polling error (15)
Sat Feb 04 12:49:20 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys1_lmon_28370.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
LMON: terminating instance due to error 29702
Sat Feb 04 12:49:20 CET 2012
System state dump is made for local instance
Sat Feb 04 12:49:20 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys1_diag_28366.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
Sat Feb 04 12:49:20 CET 2012
Trace dumping is performing id=[cdmp_20120204124920]
Sat Feb 04 12:49:20 CET 2012
Instance terminated by LMON, pid = 28370

Instance #2:

Sat Feb 04 12:44:20 CET 2012
lmon registered with NM – instance id 2 (internal mem no 1)
Sat Feb 04 12:49:20 CET 2012
Error: KGXGN polling error (15)
Sat Feb 04 12:49:20 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys2_lmon_1695.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
LMON: terminating instance due to error 29702
Sat Feb 04 12:49:20 CET 2012
Trace dumping is performing id=[cdmp_20120204124920]
System state dump is made for local instance
Sat Feb 04 12:49:21 CET 2012
Errors in file /opt/oracle/base/admin/REDSYS/bdump/redsys2_diag_1691.trc:
ORA-29702: Fehler bei Vorgang von Cluster Group Service
Sat Feb 04 12:49:21 CET 2012
Trace dumping is performing id=[cdmp_20120204124921]
Sat Feb 04 12:49:21 CET 2012
Instance terminated by LMON, pid = 1695

The LMON trace files just revealed the same information for me. I installed the 10.2.0.5.2 CRS Bundle into this ORACLE_HOME – no change. The internet – including MOS – gave hints in many directions, but nothing really seemed to match.

Finally, after a day off (you sometimes need distance!) I got it:

The error messages highly indicate an interconnect problem. The fact that a second (11gR2) database works fine at the same time excludes physics and other problems on the base of the stack. The server and the 11gR2 world has 2 interconnect interfaces. Solution: I just burned the IP of one specific interconnect interface into the init.ora parameters of the 10gR2 instance – and we got a take off. It works like a charm.

Category: Oracle, Work | 2 Comments »

SQLNET.RECV_TIMEOUT/SEND_TIMEOUT and RMAN

November 2nd, 2011 by TheBonsai

Hi there,

I was analyzing some unexpected RMAN termination, a RMAN-10038/RMAN-03009 combo:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: Fehler bei REFAF Befehl in c13 Kanal auf 10/31/2011 08:21:50
RMAN-10038: Datenbank-Session für Kanal c13 unerwartet beendet

Nothing, not even an RMAN tracing was able to reveal more hints. The trace just told it in other words. No underlying ORA/TNS error or similar.

It finally turned out it were some parameters recently added to sqlnet.ora, I set SQLNET.RECV_TIMEOUT and SQLNET.SEND_TIMEOUT and the beast silently dropped RMAN channels that were idle for a while. The next command in this channel blew up the whole RUN block of the backup script.

I removed the parameters and it works again.

Be careful with your sqlnet.ora :-)

 

Category: english, Oracle, Work | No Comments »