记一次ORA-00600[kdoirp-1]和ORA-00600[kddummy_blkchk]问题处理

问题现象如下所示:

Block recovery completed at rba 143322.154237.16, scn 0.3023859895
ORACLE Instance xsdb1 (pid = 13) - Error 600 encountered while recovering transaction (3, 16) on object 257736.
Wed Apr  4 16:12:59 2012
Errors in file /oracle/admin/xsdb/bdump/xsdb1_smon_573464.trc:
ORA-00600: internal error code, arguments: [kdoirp-1], [2], [], [], [], [], [], []

我们主要看这个错误Error 600 encountered while recovering transaction (3, 16) on object 257736,这里表象是对象257736出现了问题,需要做事物的恢复.然后查询trace文件,提示是SMON: about to recover undo segment 3,这里体现也就是回滚段出现了错误.需要做恢复.

*** SESSION ID:(544.1) 2012-04-04 19:39:39.786
SMON: about to recover undo segment 3
*** 2012-04-04 19:39:39.800
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdoirp-1], [2], [], [], [], [], [], []

因为是RAC环境, 通过查询v$rollstat的情况,确定是哪个回滚段出现问题.

SQL > SELECT segment_name,tablespace_name,owner,status FROM dba_rollback_segs;
SEGMENT_NAME    TABLESPACE_NAME     OWNER  STATUS
--------------- ------------------- ------ -------
SYSTEM          SYSTEM              SYS    ONLINE
_SYSSMU1$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU2$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU3$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU4$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU5$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU6$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU7$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU8$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU9$       UNDOTBS1            PUBLIC ONLINE
_SYSSMU10$      UNDOTBS1            PUBLIC ONLINE
_SYSSMU11$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU12$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU13$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU14$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU15$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU16$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU17$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU18$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU19$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU20$      UNDOTBS2            PUBLIC ONLINE
_SYSSMU28$      UNDOTBS2            PUBLIC OFFLINE
_SYSSMU29$      UNDOTBS2            PUBLIC OFFLINE
_SYSSMU30$      UNDOTBS2            PUBLIC OFFLINE
_SYSSMU31$      UNDOTBS2            PUBLIC OFFLINE
_SYSSMU32$      UNDOTBS2            PUBLIC OFFLINE
_SYSSMU33$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU34$      UNDOTBS2            PUBLIC OFFLINE
_SYSSMU35$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU36$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU37$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU38$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU39$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU40$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU41$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU42$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU21$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU22$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU23$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU24$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU25$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU26$      UNDOTBS1            PUBLIC OFFLINE
_SYSSMU27$      UNDOTBS1            PUBLIC OFFLINE

从上述查询中,我们可以确定回滚段_SYSSMU3$是属于undotbs1的,接下来我们的思路是要通过参数文件把该回滚段给屏蔽掉,然后重启,再删除该回滚段.我生成了pfile文件,并添加了下列两个参数.

_offline_rollback_segments= _SYSSMU3$
_corrupted_rollback_segments= _SYSSMU3$

在参数文件中设置了这两个参数,彻底的把回滚段_SYSSMU3$给屏蔽了,重启数据库,一切都正常了,SMON进程不在去找需要恢复的事务.接下来我就开始删除_SYSSMU3$,但是在删除的过程中,遇到了新问题.我使用drop rollback的命令,数据库开始报另外一个ORA-00600[kddummy_blkchk]错误.如下所示.

drop rollback segment "_SYSSMU3$"
Thu Apr  5 01:22:02 2012
Errors in file /oracle/admin/xsdb/udump/xsdb1_ora_1794114.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []

再删除该回滚段的时候,数据库crash掉了,一直报ORA-00600[kddummy_blkchk],[2],[41],[38508]错误,该错误的参数含义是ORA-00600[file#][block#][heck code],file#即问题所在的文件号,block#即问题块的块号,check code是从kcbchk()内部返回的错误代码,此时我们发现file2的block41出现了问题,通过查询mos网站,How to Resolve ORA-00600[kddummy_blkchk] [ID 1342443.1],需要设置db_block_checksum和 db_block_checking为false.此时我通过v$datafile进行了确认,file2就是我们的undotbs1.

db_block_checking=false;
db_block_checksum=false;

经过设置这两个参数,数据库可以打开了.这里有一篇文章是介绍这两个参数的.db_block_checking和db_block_checksum深入研究.因为undotbs1有坏块,所以我决定重建undo来解决该问题.当打开数据库之后,我就开始重建undo,剩余操作如下所示:

create undo tablespace undotbs11 datafile '/dev/rdisk58' size 5000m;
alter system set undo_tablespace=undotbs11 scope=both sid='xsdb1';

切换完成之后,需要查询rollstat的状态,通过下列语句查询,如果status都是online的话,就可以彻底的删除掉undotbs1了.

select usn,xacts,status,rssize/1024/1024/1024,hwmsize/1024/1024/1024,shrinks from gv$rollstat order by rssize;

后来询问客户这个错误主要产生的原因:是因为客户在做shrink的时候undo出现了坏块.最终引起了该问题的产生,重新切换undo后一切正常!

参考文档:

db_block_checking和db_block_checksum深入研究

How to Resolve ORA-00600[kddummy_blkchk] [ID 1342443.1]


分享到: 更多

Post a Comment

Your email is never published nor shared. Required fields are marked *