问题现象如下所示:
Block recovery completed at rba 143322.154237.16, scn 0.3023859895 ORACLE Instance xsdb1 (pid = 13) - Error 600 encountered while recovering transaction (3, 16) on object 257736. Wed Apr 4 16:12:59 2012 Errors in file /oracle/admin/xsdb/bdump/xsdb1_smon_573464.trc: ORA-00600: internal error code, arguments: [kdoirp-1], [2], [], [], [], [], [], []
我们主要看这个错误Error 600 encountered while recovering transaction (3, 16) on object 257736,这里表象是对象257736出现了问题,需要做事物的恢复.然后查询trace文件,提示是SMON: about to recover undo segment 3,这里体现也就是回滚段出现了错误.需要做恢复.
*** SESSION ID:(544.1) 2012-04-04 19:39:39.786 SMON: about to recover undo segment 3 *** 2012-04-04 19:39:39.800 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kdoirp-1], [2], [], [], [], [], [], []
因为是RAC环境, 通过查询v$rollstat的情况,确定是哪个回滚段出现问题.
SQL > SELECT segment_name,tablespace_name,owner,status FROM dba_rollback_segs; SEGMENT_NAME TABLESPACE_NAME OWNER STATUS --------------- ------------------- ------ ------- SYSTEM SYSTEM SYS ONLINE _SYSSMU1$ UNDOTBS1 PUBLIC ONLINE _SYSSMU2$ UNDOTBS1 PUBLIC ONLINE _SYSSMU3$ UNDOTBS1 PUBLIC ONLINE _SYSSMU4$ UNDOTBS1 PUBLIC ONLINE _SYSSMU5$ UNDOTBS1 PUBLIC ONLINE _SYSSMU6$ UNDOTBS1 PUBLIC ONLINE _SYSSMU7$ UNDOTBS1 PUBLIC ONLINE _SYSSMU8$ UNDOTBS1 PUBLIC ONLINE _SYSSMU9$ UNDOTBS1 PUBLIC ONLINE _SYSSMU10$ UNDOTBS1 PUBLIC ONLINE _SYSSMU11$ UNDOTBS2 PUBLIC ONLINE _SYSSMU12$ UNDOTBS2 PUBLIC ONLINE _SYSSMU13$ UNDOTBS2 PUBLIC ONLINE _SYSSMU14$ UNDOTBS2 PUBLIC ONLINE _SYSSMU15$ UNDOTBS2 PUBLIC ONLINE _SYSSMU16$ UNDOTBS2 PUBLIC ONLINE _SYSSMU17$ UNDOTBS2 PUBLIC ONLINE _SYSSMU18$ UNDOTBS2 PUBLIC ONLINE _SYSSMU19$ UNDOTBS2 PUBLIC ONLINE _SYSSMU20$ UNDOTBS2 PUBLIC ONLINE _SYSSMU28$ UNDOTBS2 PUBLIC OFFLINE _SYSSMU29$ UNDOTBS2 PUBLIC OFFLINE _SYSSMU30$ UNDOTBS2 PUBLIC OFFLINE _SYSSMU31$ UNDOTBS2 PUBLIC OFFLINE _SYSSMU32$ UNDOTBS2 PUBLIC OFFLINE _SYSSMU33$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU34$ UNDOTBS2 PUBLIC OFFLINE _SYSSMU35$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU36$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU37$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU38$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU39$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU40$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU41$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU42$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU21$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU22$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU23$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU24$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU25$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU26$ UNDOTBS1 PUBLIC OFFLINE _SYSSMU27$ UNDOTBS1 PUBLIC OFFLINE
从上述查询中,我们可以确定回滚段_SYSSMU3$是属于undotbs1的,接下来我们的思路是要通过参数文件把该回滚段给屏蔽掉,然后重启,再删除该回滚段.我生成了pfile文件,并添加了下列两个参数.
_offline_rollback_segments= _SYSSMU3$ _corrupted_rollback_segments= _SYSSMU3$
在参数文件中设置了这两个参数,彻底的把回滚段_SYSSMU3$给屏蔽了,重启数据库,一切都正常了,SMON进程不在去找需要恢复的事务.接下来我就开始删除_SYSSMU3$,但是在删除的过程中,遇到了新问题.我使用drop rollback的命令,数据库开始报另外一个ORA-00600[kddummy_blkchk]错误.如下所示.
drop rollback segment "_SYSSMU3$" Thu Apr 5 01:22:02 2012 Errors in file /oracle/admin/xsdb/udump/xsdb1_ora_1794114.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
再删除该回滚段的时候,数据库crash掉了,一直报ORA-00600[kddummy_blkchk],[2],[41],[38508]错误,该错误的参数含义是ORA-00600[file#][block#][heck code],file#即问题所在的文件号,block#即问题块的块号,check code是从kcbchk()内部返回的错误代码,此时我们发现file2的block41出现了问题,通过查询mos网站,How to Resolve ORA-00600[kddummy_blkchk] [ID 1342443.1],需要设置db_block_checksum和 db_block_checking为false.此时我通过v$datafile进行了确认,file2就是我们的undotbs1.
db_block_checking=false; db_block_checksum=false;
经过设置这两个参数,数据库可以打开了.这里有一篇文章是介绍这两个参数的.db_block_checking和db_block_checksum深入研究.因为undotbs1有坏块,所以我决定重建undo来解决该问题.当打开数据库之后,我就开始重建undo,剩余操作如下所示:
create undo tablespace undotbs11 datafile '/dev/rdisk58' size 5000m; alter system set undo_tablespace=undotbs11 scope=both sid='xsdb1';
切换完成之后,需要查询rollstat的状态,通过下列语句查询,如果status都是online的话,就可以彻底的删除掉undotbs1了.
select usn,xacts,status,rssize/1024/1024/1024,hwmsize/1024/1024/1024,shrinks from gv$rollstat order by rssize;
后来询问客户这个错误主要产生的原因:是因为客户在做shrink的时候undo出现了坏块.最终引起了该问题的产生,重新切换undo后一切正常!
参考文档:
db_block_checking和db_block_checksum深入研究
How to Resolve ORA-00600[kddummy_blkchk] [ID 1342443.1]
Post a Comment