Few days back in the early wee hours, one of my RAC node db instance crashed.
Env: 10.2.0.4 RAC on Solaris 5.10 SPARC
alert.log said:
:
Errors in file /u01/app/oracle/admin/SHCL1/bdump/shcl1n02_j004_27414.trc:
ORA-07445: exception encountered: core dump [kglobcl()+412] [SIGSEGV] [Address not mapped to object] [0x49415C002] [] []
Tue Nov 3 00:00:27 2009
Trace dumping is performing id=[cdmp_20091103000027]
Tue Nov 3 00:00:37 2009
Errors in file /u01/app/oracle/admin/SHCL1/bdump/shcl1n02_pmon_3278.trc:
ORA-07445: [kglobcl()+412] [SIGSEGV] [Address not mapped to object] [0x49415C002] [] []
Tue Nov 3 00:00:51 2009
:
:
Tue Nov 3 00:00:54 2009
MMAN: terminating instance due to error 472
Instance terminated by MMAN, pid = 3368
Wed Nov 4 10:44:47 2009
:
I checked
$ crsctl check crs
OK (running successfully)
$ srvctl status nodeapps -n myjpsuolicdbd02
OK (all resources running successfully)
$ srvctl status database -d SHCL1_PRMY
Instance SHCL1N01 is running on node myjpsuolicdbd01
PRKO-2015 : Error in checking condition of instance on node: myjpsuolicdbd02
$
Since only instance 2 was down, i tried to start it and i got the below error.
$ srvctl start instance -d SHCL1_PRMY -i SHCL1N02
PRKP-1001 : Error starting instance SHCL1N02 on node myjpsuolicdbd02
CRS-1028: Dependency analysis failed because of:
CRS-0223: Resource ‘ora.SHCL1_prmy.SHCL1N02.inst’ has placement error.
$
The crsd.log file too didn’t have much(to the point) information.
2009-11-04 10:18:30.627: [ CRSRES][2970821] CRS-1028: Dependency analysis failed because of:'Resource in UNKNOWN state: ora.SHCL1_prmy.SHCL1N02.inst'
2009-11-04 10:21:23.269: [ CRSRES][2970843] StopResource: setting CLI values
2009-11-04 10:21:23.340: [ CRSRES][2970843] Attempting to stop `ora.SHCL1_prmy.SHCL1N02.inst` on member `myjpsuolicdbd02`
2009-11-04 10:21:30.478: [ CRSAPP][2970843] StopResource error for ora.SHCL1_prmy.SHCL1N02.inst error code = 1
2009-11-04 10:21:30.502: [ CRSRES][2970843] Stop of `ora.SHCL1_prmy.SHCL1N02.inst` on member `myjpsuolicdbd02` succeeded.
2009-11-04 10:21:49.867: [ CRSRES][2970861] startRunnable: setting CLI values
2009-11-04 10:21:49.895: [ CRSRES][2970861] Attempting to start `ora.SHCL1_prmy.SHCL1N02.inst` on member `myjpsuolicdbd02`
2009-11-04 10:21:55.019: [ CRSAPP][2970861] StartResource error for ora.SHCL1_prmy.SHCL1N02.inst error code = 1
2009-11-04 10:22:00.583: [ CRSAPP][2970861] StopResource error for ora.SHCL1_prmy.SHCL1N02.inst error code = 1
2009-11-04 10:22:00.592: [ CRSRES][2970861] X_OP_StopResourceFailed : Stop Resource failed
(File: rti.cpp, line: 1803
2009-11-04 10:22:00.593: [ CRSRES][2970861][ALERT] `ora.SHCL1_prmy.SHCL1N02.inst` on member `myjpsuolicdbd02` has experienced an unrecoverable failure.
2009-11-04 10:22:00.593: [ CRSRES][2970861] Human intervention required to resume its availability.
Clusterware and nodeapps were up and running but db instance on node 2.
I could startup the instance from sqlplus, but this would not start the db services etc ora.SHCL1_prmy.SHCL1N02.inst was not ONLINE and would be of no use in a RAC.
Didn’t have any clue, until I came across the logs under /u01/app/oracle/product/10.2.0/db_1/log/<hostname>/racg/imonSHCL1_prmy.log and imon_SHCL1_prmy.log.
The files imonSHCL1_prmy.log and imon_SHCL1_prmy.log contains details of the starting up of the DB using SRVCTL command.
SQL*Plus: Release 10.2.0.4.0 - Production on Wed Nov 4 18:59:48 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Enter user-name: ERROR:
ORA-01031: insufficient privileges
Enter user-name: SP2-0306: Invalid option.
2009-11-04 18:59:48.968: [ RACG][176] [28813][176][ora.SHCL1_prmy.SHCL1N02.inst]: Usage: CONN[ECT] [logon] [AS {SYSDBA|SYSOPER}]
where <logon> ::= <username>[/<password>][@<connect_identifier>] | /
Enter user-name: Enter password:
ERROR:
ORA-01005: null password given; logon denied
2009-11-04 18:59:48.968: [ RACG][176] [28813][176][ora.SHCL1_prmy.SHCL1N02.inst]: SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus
2009-11-04 18:59:48.968: [ RACG][176] [28813][176][ora.SHCL1_prmy.SHCL1N02.inst]: clsrcexecut: env _USR_ORA_PFILE=
2009-11-04 18:59:48.968: [ RACG][176] [28813][176][ora.SHCL1_prmy.SHCL1N02.inst]: clsrcexecut: cmd = /u01/app/oracle/product/10.2.0/db_1/bin/racgeut -e _USR_ORA_DEBUG=0 -e ORACLE_SID=SHCL1N02 520 /u01/app/oracle/product/10.2.0/db_1/bin/racgmdb -d abort
CAUSE:
Due to security concern, recently i added “ SQLNET.AUTHENTICATION_SERVICES=(NONE) ” to sqlnet.ora and this line would insist on providing the password explicitly. It doesn’t allow to login using “sqlplus / as sysdba”. Hence the SRVCTL failed.
So,
After removing this line from sqlnet.ora, i could start the instance.
SOLUTION:
1)Commented the entry in sqlnet.ora
2)crs_stat -u => check for “ora.SHCL1_prmy.SHCL1N02.inst”, the STATE should be OFFLINE before we ’start’ again.
NAME=ora.SHCL1_prmy.SHCL1N02.inst
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on myjpsuolicdbd02
If the STATE is UNKNOWN, ’start’ing the instance would error out. So stop it by force.
2)crs_stop -f ora.SHCL1_prmy.SHCL1N02.inst
3)crs_stat -u => to confirm that the TARGET is ONLINE and STATE is OFFLINE for ora.SHCL1_prmy.SHCL1N02.inst
4)srvctl start instance -d SHCL1_PRMY -i SHCL1N02
Raised an SR for the ORA 7445 error and let me see what they have in stock for me