修改并行参数引发ORA-600 [kgeade_is_0]的问题处理

客户有一套数据库,这周有例行停机维护的时间,于是我们趁这次停机例行维护的时间区间进行PARALLEL_EXECUTION_MESSAGE_SIZE参数的修改,修改完成后在重启的过程中遇到了ORA-00600[KGEADE_IS_0]的错误。首先来说一下为什么要修改PARALLEL_EXECUTION_MESSAGE_SIZE这个参数,根据Oracle最佳实践的推荐,10g默认装完数据库该参数的值是2152,也有可能是2048,推荐将这个值设置成8192,而在11g中,这个值默认被设置成了16K,是可以满足大多数应用场景的。这个值的作用就是在并行执行中消息的大小。这个值越大,需要的shared pool也就越大。虽然能获得更好的性能,但是相应的内存也需要的更多了。还有:这个参数在并行恢复或者是standby recover情况下,增加它的大小到4096以上,也能提升至少20%恢复速度。

我们来看一下我们的报错的情况,我们修改一个节点该参数,然后直接重启。

Sun Jul 13 16:57:58 CST 2014
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_m000_21519.trc:
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Sun Jul 13 16:57:59 CST 2014
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mmon_21339.trc:
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Sun Jul 13 16:58:00 CST 2014
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mmon_21339.trc:
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Sun Jul 13 16:58:00 CST 2014
Trace dumping is performing id=[cdmp_20140713165800]
Sun Jul 13 16:58:01 CST 2014
Trace dumping is performing id=[cdmp_20140713165801]
Sun Jul 13 16:58:07 CST 2014
Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_m000_21519.trc:
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Sun Jul 13 16:58:07 CST 2014
Trace dumping is performing id=[cdmp_20140713165807]

*** 2014-07-13 16:57:58.781
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Current SQL statement for this session:
select tablespace_id, rfno, allocated_space, file_size, file_maxsize, changescn_base, changescn_wrap, flag from GV$FILESPACE_USAGE where inst_id != :inst and (changescn_wrap >= :w or (changescn_wrap = :w and changescn_base >= :b))

*** 2014-07-13 16:57:59.274
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT INSTANCE_NAME, HOST_NAME, NVL(GVI_STARTUP_TIME, SYSTIMESTAMP) - INTERVAL '1' SECOND AS SHUTDOWN_TIME FROM (SELECT RRI.INSTANCE_NAME AS INSTANCE_NAME, RRI.HOST_NAME AS HOST_NAME, FROM_TZ(RRI.STARTUP_TIME
, '+00:00') AS RRI_STARTUP_TIME, DBMS_HA_ALERTS_PRVT.INSTANCE_STARTUP_TIMESTAMP_TZ(GVI.STARTUP_TIME) AS GVI_STARTUP_TIME FROM RECENT_RESOURCE_INCARNATIONS$ RRI LEFT OUTER JOIN GV$INSTANCE GVI ON GVI.INSTANCE_N
AME = RRI.RESOURCE_NAME WHERE RRI.RESOURCE_TYPE = 'INSTANCE' AND :B2 = RRI.DB_UNIQUE_NAME AND :B1 = RRI.DB_DOMAIN) WHERE GVI_STARTUP_TIME IS NULL OR GVI_STARTUP_TIME > RRI_STARTUP_TIME GROUP BY INSTANCE_NAME, 
HOST_NAME, GVI_STARTUP_TIME
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x7de705a8       301  package body SYS.DBMS_HA_ALERTS_PRVT
0x7de64740         1  anonymous block

可以看到,都是在查询GV$视图的语句出现了这个错误。我们在来看一下它出错时候的堆栈信息。

ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF778810B0 ? 7FFF77881110 ?
                                                   7FFF77881050 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF778810B0 ? 7FFF77881110 ?
                                                   7FFF77881050 ? 000000000 ?
ksfdmp()+63          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFF778810B0 ? 7FFF77881110 ?
                                                   7FFF77881050 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             006AE9A20 ? 000000003 ?
                                                   7FFF778810B0 ? 7FFF77881110 ?
                                                   7FFF77881050 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            006AE9A20 ? 2B763E0B0040 ?
                                                   7FFF77881110 ? 7FFF77881050 ?
                                                   000000000 ? 000000000 ?
kgeade()+501         call     kgeasnmierr()        006AE9A20 ? 2B763E0B0040 ?
                                                   7FFF77881110 ? 7FFF77881050 ?
                                                   000000000 ? 000000000 ?
kgerev()+58          call     kgeade()             2B763E0B0040 ? 006AE9A20 ?
                                                   2B763E0B0040 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kserec0()+186        call     kgerev()             006AE9A20 ? 2B763E0B0040 ?
                                                   000000000 ? 000000000 ?
                                                   7FFF778821A0 ? 000000000 ?
kxfpg1sg()+2014      call     kserec0()            006AE9A20 ? 000000001 ?
                                                   000000029 ? 7FFF77881F40 ?
                                                   000000000 ? 388B519840 ?
kxfpgsg()+2098       call     kxfpg1sg()           08364D278 ? 000000001 ?
                                                   7FFF778822B0 ? 7FFF77881F40 ?
                                                   08364CC48 ? 2B7600000001 ?
kxfrAllocSlaves()+3  call     kxfpgsg()            000000005 ? 000000001 ?
51                                                 000000001 ? 000000001 ?
                                                   3E0A254800000001 ?
                                                   2B763E0A2548 ?
kxfrialo()+2111      call     kxfrAllocSlaves()    00005322E ? 2B763E5726C0 ?
                                                   000000001 ? 7FFF00000001 ?
                                                   7FFF00000001 ? 000000001 ?
kxfralo()+313        call     kxfrialo()           00005322E ? 2B763E5726C0 ?
                                                   000000001 ? 07DAA7230 ?
                                                   2B763E572768 ? 7FFF77880000 ?
qerpx_rowsrc_start(  call     kxfralo()            00005322E ? 2B763E5726C0 ?
)+3892                                             000000001 ? 07DAA7230 ?
                                                   2B763E572768 ? 000000000 ?
qerpxStart()+234     call     qerpx_rowsrc_start(  7FFF77883280 ? 000000001 ?
                              )                    000000001 ? 07DAA8910 ?
                                                   100000001 ? 000000000 ?
selexe()+667         call     qerpxStart()         000000001 ? 000003F60 ?
                                                   000000001 ? 07DAA8910 ?
                                                   100000001 ? 000000000 ?
opiexe()+4687        call     selexe()             07DACBB38 ? 7FFF77883F60 ?
                                                   7FFF77883F60 ? 07DACBB38 ?
                                                   100000001 ? 000000000 ?
kpoal8()+2295        call     opiexe()             000000049 ? 000000003 ?
                                                   7FFF77884428 ? 000000003 ?
                                                   100000001 ? 000000000 ?
opiodr()+1184        call     kpoal8()             00000005E ? 000000000 ?
                                                   7FFF77887EF8 ? 000000003 ?
                                                   83B7000000000001 ?
                                                   000000000 ?
kpoodrc()+38         call     opiodr()             00000005E ? 000000000 ?
                                                   7FFF77887EF8 ? 000000000 ?
                                                   005BEBDF0 ? 000000000 ?
rpiswu2()+409        call     kpoodrc()            7FFF77885440 ? 000000000 ?
                                                   7FFF77887EF8 ? 000000000 ?
                                                   005BEBDF0 ? 000000000 ?
kpoodr()+554         call     rpiswu2()            083B7ABF0 ? 000000000 ?
                                                   2B763E0F0CBC ? 000000002 ?
                                                   2B763E0F0CFC ? 000000000 ?
upirtrc()+2101       call     kpoodr()             2B763E342E20 ? 00000005E ?
                                                   7FFF77887EF8 ? 000000000 ?
                                                   2B763E0F0CFC ? 000000000 ?
kpurcsc()+125        call     upirtrc()            2B763E342E20 ? 00000005E ?
                                                   7FFF77887EF8 ? 7FFF77888060 ?
                                                   7FFF77888FD0 ? 003C558C6 ?
kpuexecv8()+1705     call     kpurcsc()            7FFF778897D0 ? 00000005E ?
                                                   7FFF77887EF8 ? 7FFF77888060 ?
                                                   7FFF77888FD0 ? 003C558C6 ?
kpuexec()+2643       call     kpuexecv8()          2B763E0FE958 ? 2B763E33F4C0 ?
                                                   2B763E33F540 ? 000000000 ?
                                                   000000000 ? 7FFF7788A8C4 ?
OCIStmtExecute()+41  call     kpuexec()            000000001 ? 2B763E33F4C0 ?
                                                   2B763E342DB0 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ktte_aggregate_finf  call     OCIStmtExecute()     000000001 ? 2B763E33F4C0 ?
o()+3133                                           2B763E342DB0 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ktte_monitor_tsth()  call     ktte_aggregate_finf  7FFF7788B780 ? 000000001 ?
+788                          o()                  000000009 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ktte_threshold_slav  call     ktte_monitor_tsth()  7FFF7788B780 ? 000000001 ?
e()+183                                            000000009 ? 000000001 ?
                                                   000000000 ? 000000000 ?
kebm_slave_main()+2  call     ktte_threshold_slav  07F63B200 ? 000000001 ?
21                            e()                  000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksvrdp()+1159        call     kebm_slave_main()    07F63B200 ? 07F63B200 ?
                                                   000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
opirip()+748         call     ksvrdp()             07F63B200 ? 07F63B200 ?
                                                   000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
opidrv()+583         call     opirip()             000000032 ? 000000004 ?
                                                   7FFF7788D298 ? 000000001 ?
                                                   000000000 ? 000000000 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FFF7788D298 ? 000000001 ?
                                                   000000000 ? 000000000 ?
opimai_real()+317    call     sou2o()              7FFF7788D270 ? 000000032 ?
                                                   000000004 ? 7FFF7788D298 ?
                                                   000000000 ? 000000000 ?
main()+116           call     opimai_real()        000000003 ? 7FFF7788D300 ?
                                                   000000004 ? 7FFF7788D298 ?
                                                   000000000 ? 000000000 ?
__libc_start_main()  call     main()               000000003 ? 7FFF7788D300 ?
+244                                               000000004 ? 7FFF7788D298 ?
                                                   000000000 ? 000000000 ?
_start()+41          call     __libc_start_main()  00072D108 ? 000000001 ?
                                                   7FFF7788D458 ? 000000000 ?
                                                   000000000 ? 000000003 ?

根据文档ORA-600 [kgeade_is_0] In A Real Application Cluster (RAC) Environment (文档 ID 797182.1)里面的描述,凡是trace文件堆栈信息类似于“kxfpg1sg kxfpgsg kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start”这样的,命中bug8592375。解决这个问题的办法也很简单,就是把两个库实例都停下来,修改成相同的参数,然后启动。像我们这样一个实例还在运行着,使用的是以前的参数,而新实例启动之后用的新的参数,就会导致这个问题的出现。还一个办法是安装补丁程序,但是感觉这个补丁是针对standby数据库的。8592375: PHSB: READABLE STANDBY REPORTED ORA-00700:[KGEADE_IS_0]。

参考文档:ORA-600 [kgeade_is_0] In A Real Application Cluster (RAC) Environment (文档 ID 797182.1)

分享到: 更多