ORA-00600: internal error code, arguments: [15709], [29], [1]故障解决

客户一套10.2.0.4的数据库,一个实例突然的Crash掉了。客户想让我们帮忙分析宕机的原因。对于这种数据库突然Crash的问题,我们首先就会看数据库的Alert日志,可以看到在宕机之前,SMON进程报了ORA-00600[15709]的错误,紧接数据库就输出了一条信息“Fatal internal error happened while SMON was doing active transaction recovery.”也就是说SMON在做活动事务恢复的时候出现了异常。最终导致了数据库实例的宕机。日志输出如下所示:

Fri Sep 26 10:53:35 2014
Errors in file /oracle/app/oracle/admin/wxyydb/bdump/wxyydb_smon_28997.trc:
ORA-00600: internal error code, arguments: [15709], [29], [1], [], [], [], [], []
ORA-30319: Message 30319 not found;  product=RDBMS; facility=ORA
Fri Sep 26 10:53:55 2014
Fatal internal error happened while SMON was doing active transaction recovery.
Fri Sep 26 10:53:55 2014
Errors in file /oracle/app/oracle/admin/wxyydb/bdump/wxyydb_smon_28997.trc:
ORA-00600: internal error code, arguments: [15709], [29], [1], [], [], [], [], []
ORA-30319: Message 30319 not found;  product=RDBMS; facility=ORA
SMON: terminating instance due to error 474
Termination issued to instance processes. Waiting for the processes to exit
Fri Sep 26 10:54:05 2014
Instance termination failed to kill one or more processes
Instance terminated by SMON, pid = 28997

我们再来分析一下wxyydb_smon_28997.trc文件的信息。可以看到数据库的SMON进程一直尝试在做并行恢复事务。在恢复的过程中遇到了ORA-00600错误,最终底层代码异常触发了数据库的宕机。

*** 2014-09-26 10:10:36.236
Parallel Transaction recovery caught error 30319 
*** 2014-09-26 10:15:10.643
Parallel Transaction recovery caught exception 30319
*** 2014-09-26 10:15:21.816
Parallel Transaction recovery caught error 30319 
*** 2014-09-26 10:19:51.707
Parallel Transaction recovery caught exception 30319
*** 2014-09-26 10:53:35.830
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [15709], [29], [1], [], [], [], [], []
ORA-30319: Message 30319 not found;  product=RDBMS; facility=ORA
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+64          call     ksedst1()            000000000 ? 000000001 ?
ksedmp()+2176        call     ksedst()             000000000 ?
                                                   C000000000000C9F ?
                                                   4000000004057F40 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ksfdmp()+48          call     ksedmp()             000000003 ?
kgeriv()+336         call     ksfdmp()             C000000000000695 ?
                                                   000000003 ?
                                                   40000000095185E0 ?
                                                   00000EC33 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
kgeasi()+416         call     kgeriv()             6000000000031770 ?
                                                   6000000000032828 ?
                                                   4000000001A504E0 ?
                                                   000000002 ?
                                                   9FFFFFFFFFFFA138 ?
$cold_kxfpqsrls()+1  call     kgeasi()             6000000000031770 ?
168                                                9FFFFFFFFD3D2290 ?
                                                   000003D5D ? 000000002 ?
                                                   000000002 ? 0000003E7 ?
                                                   000003D5D ?
                                                   9FFFFFFFFD3D22A0 ?
kxfpqrsod()+1104     call     $cold_kxfpqsrls()    C0000004FDF7A838 ?
                                                   C0000004FDF74430 ?
                                                   000000004 ?
                                                   9FFFFFFFFFFFA200 ?
                                                   C0000000000011AB ?
                                                   4000000003AA1250 ?
                                                   00000EDF5 ? 000000001 ?
kxfpdelqrefs()+640   call     kxfpqrsod()          C0000004FDF74430 ?
                                                   000000001 ?
                                                   60000000000B6300 ?
                                                   C000000000000694 ?
                                                   4000000003DD14F0 ?
                                                   00000EE2D ?
                                                   60000000000C6708 ?
kxfpqsod_qc_sod()+2  call     kxfpdelqrefs()       00000003E ? 000000001 ?
016                                                60000000000B6300 ?
                                                   C000000000001028 ?
                                                   40000000025DE5A0 ?
                                                   4000000001B1A110 ?
                                                   60000000000C2D04 ?
                                                   60000000000C2E90 ?
kxfpqsod()+816       call     kxfpqsod_qc_sod()    000000010 ? 000000001 ?
                                                   9FFFFFFFFFFFA260 ?
                                                   60000000000B6300 ?
                                                   9FFFFFFFFFFFA7F0 ?
                                                   C000000000001028 ?
                                                   40000000025DF810 ?
                                                   00000EE65 ?
ktprdestroy()+208    call     kxfpqsod()           C0000004FDF7A838 ?
                                                   000000001 ?
                                                   9FFFFFFFFFFFA810 ?
                                                   60000000000B6300 ?
                                                   9FFFFFFFFFFFAD90 ?
ktprbeg()+8272       call     ktprdestroy()        C000000000001026 ?
                                                   40000000025615B0 ?
                                                   000006E61 ? 000000000 ?
                                                   4000000001052E40 ?
                                                   000000000 ?
ktmmon()+10096       call     ktprbeg()            9FFFFFFFFFFFBE70 ?
                                                   9FFFFFFFFFFFADA0 ?
                                                   60000000000B6300 ?
                                                   40000000028B75A0 ?
                                                   00000EF21 ?
                                                   9FFFFFFFFFFFADD8 ?
                                                   9FFFFFFFFFFFADE0 ?
ktmSmonMain()+64     call     ktmmon()             9FFFFFFFFFFFD140 ?
ksbrdp()+2816        call     ktmSmonMain()        C000000100E1CA60 ?
                                                   C000000000000FA5 ?
                                                   000007361 ?
                                                   4000000003B5AE10 ?
                                                   C000000000000205 ?
                                                   400000000409DCD0 ?
opirip()+1136        call     ksbrdp()             9FFFFFFFFFFFD150 ?
                                                   60000000000B6300 ?
                                                   9FFFFFFFFFFFDC90 ?
                                                   4000000002863EF0 ?
                                                   000004861 ?
                                                   C000000000000B1D ?
                                                   60000000000318F0 ?
$cold_opidrv()+1408  call     opirip()             9FFFFFFFFFFFEA70 ?
                                                   000000004 ?
                                                   9FFFFFFFFFFFF090 ?
                                                   9FFFFFFFFFFFDCA0 ?
                                                   60000000000B6300 ?
                                                   C000000000000DA1 ?
sou2o()+336          call     $cold_opidrv()       000000032 ?
                                                   9FFFFFFFFFFFF090 ?
                                                   60000000000C2C78 ?
$cold_opimai_real()  call     sou2o()              9FFFFFFFFFFFF0B0 ?
+640                                               000000032 ? 000000004 ?
                                                   9FFFFFFFFFFFF090 ?
main()+368           call     $cold_opimai_real()  000000003 ? 000000000 ?
main_opd_entry()+80  call     main()               000000003 ?
                                                   9FFFFFFFFFFFF598 ?
                                                   60000000000B6300 ?
                                                   C000000000000004 ?
 

根据ORA-00600[15709],我们在Oracle Support上找到一篇文档,SMON may fail with ORA-00600 [15709] Errors Crashing the Instance (文档 ID 736348.1),这篇文档的错误信息和我们所报出来的信息雷同。这篇文档列出了出现错误的堆栈情况:kxfpqsrls <- kxfpqrsod <- kxfpdelqrefs <- kxfpqsod_qc_sod <- kxfpqsod <- ktprdestroy <- ktprbe <- ktmmon。我们可以从SMON的Trace里面看到,堆栈内容基本上和这个匹配。所以,这个问题是在恢复的过程中命中了bug 695472,而如果你安装了这个patch,还是有类似的问题,很可能是遇到了另外一个类似的bug 9233544,Oracle的Bug还真是多啊。

bug 695472会影响9.2.0.8和10.2.0.4这两个版本,并且在10.2.0.4.2和10.2.0.5,11.1.0.7,11.2.0.1上得到了修复。解决bug 695472的方法是:

1.Use the following workaround

Set fast_start_parallel_rollback=false and recovery_parallelism=0

OR

2.Apply one-off  <<Patch:6954722>>, if available for your platform/version here.

OR

3.Upgrade to fixed release 10.2.0.5, 11.1.0.7 or 11.2.0.1.

bug 9233544会影响10.2.0.4,11.1.0.7和11.2.0.1这三个版本,并且在11.2.0.3和12.1上得到了修复,解决bug 9233544的方法是:

1.Apply patchset 11.2.0.3, in which Bug: 9233544 is fixed.

OR

2.Check if one-off Patch:9233544 is available for your release and platform here.

我们仔细检查了一下系统的补丁,发现系统已经安装了patch 6954722,那就证明是bug 9233544影响的。要么升级到11.2.0.3的版本,要么就是安装单独的patch 9233544。对于升级11.2.0.3这个动作太大了,给客户说了一下考虑安装小patch来解决。

分享到: 更多