问题是这样的,服务器是hp-unix,数据库版本是10.2.0.3,检查服务器的时候用top查看,发现有几个进程cpu一直是100%.如下所示:
System: oradb2 Wed Apr 18 13:35:33 2012 Load averages: 0.90, 0.93, 0.93 424 processes: 344 sleeping, 80 running Cpu states: CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0 0.93 75.9% 0.0% 2.8% 21.3% 0.0% 0.0% 0.0% 0.0% 1 0.93 83.5% 0.0% 2.2% 14.3% 0.0% 0.0% 0.0% 0.0% 2 0.90 70.6% 0.0% 3.4% 26.0% 0.0% 0.0% 0.0% 0.0% 3 0.90 83.7% 0.0% 3.8% 12.5% 0.0% 0.0% 0.0% 0.0% 4 0.93 81.9% 0.0% 3.0% 15.1% 0.0% 0.0% 0.0% 0.0% 5 0.90 69.2% 0.0% 4.8% 26.0% 0.0% 0.0% 0.0% 0.0% 6 0.87 59.6% 0.0% 7.8% 32.6% 0.0% 0.0% 0.0% 0.0% 7 0.87 81.3% 0.0% 4.0% 14.7% 0.0% 0.0% 0.0% 0.0% --- ---- ----- ----- ----- ----- ----- ----- ----- ----- avg 0.90 75.7% 0.0% 4.0% 20.3% 0.0% 0.0% 0.0% 0.0% System Page Size: 64Kbytes Memory: 12886848K (5738368K) real, 14577152K (6735232K) virtual, 7254400K free Page# 1/20 CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND 4 ? 13319 oracle 152 20 278M 155M run 464779:36 98.45 98.28 evmd.bin 4 ? 15425 oracle 241 20 16001M 13632K run 177954:10 97.07 96.90 oracleqxsm1 3 ? 11033 oracle 239 20 16001M 13632K run 203549:16 96.82 96.65 oracleqxsm1 0 ? 22968 oracle 241 20 16001M 13632K run 181274:44 96.70 96.53 oracleqxsm1 6 ? 1776 oracle 241 20 16001M 13632K run 177366:01 94.99 94.82 oracleqxsm1 5 ? 11710 oracle 152 20 16011M 21952K run 2094:50 44.69 44.61 oracleqxsm1 2 ? 11985 oracle 241 20 16002M 14400K run 3741:36 20.28 20.24 oracleqxsm1 7 ? 13101 oracle 152 20 16002M 14784K run 419:53 16.00 15.97 oracleqxsm1 2 ? 14720 oracle 152 20 16002M 14784K run 5291:05 15.85 15.82 oracleqxsm1 7 ? 13197 oracle 152 20 16002M 14784K run 418:00 15.73 15.71 oracleqxsm1 4 ? 6998 oracle 154 20 16005M 17728K sleep 1668:38 4.28 4.27 ora_lmon_qxsm1 4 ? 9867 oracle 154 20 16004M 17024K sleep 1421:01 3.83 3.82 oracleqxsm1 3 ? 6023 oracle 154 20 16002M 14464K sleep 80:50 2.32 2.31 oracleqxsm1 6 ? 3859 oracle 154 20 16002M 14464K sleep 41:08 2.24 2.24 oracleqxsm1 5 ? 12623 oracle 154 20 16002M 14656K sleep 48:49 2.06 2.05 oracleqxsm1 4 ? 21768 oracle 154 20 16002M 14720K sleep 100:10 1.89 1.89 oracleqxsm1 4 ? 7002 oracle 152 20 16015M 27968K run 28036:37 1.84 1.84 ora_lms0_qxsm1 4 ? 7004 oracle 152 20 16015M 27968K run 27220:29 1.63 1.62 ora_lms1_qxsm1 6 ? 71 root 152 20 96000K 32000K run 5572:37 1.47 1.46 vxfsd 7 ? 2021 root 152 20 43648K 6272K run 6580:11 1.33 1.33 cimprovagt 4 ? 20951 oracle 154 20 16001M 13056K sleep 9:35 1.27 1.27 oracleqxsm1
于是我就登陆到sqlplus上查询这些会话引起的原因,一般是去看等待事件.这里出现了一个很奇怪的等待事件kksfbc child completion.如下所示:
SQL> SELECT SID,SERIAL#, USERNAME,MACHINE,PREV_SQL_ID,sql_id,status,event FROM v$session b 2 WHERE b.paddr = (SELECT addr 4 FROM v$process c 5 WHERE c.spid = 15425); SID SERIAL# USERNAME MACHINE PREV_SQL_ID SQL_ID STATUS EVENT ---- ---------- --------------------------------------- ------------- -------- ----------------------- 202 17280 YDSMSRUN server-slot10 d4ggk0agny9tf a14cxxh70u8hy ACTIVE kksfbc child completion
这种奇怪的问题也只好上MOS上寻找答案了.查到了这是bug 6795880造成的.当产生一个kksfbc child completion的等待之后,这个会话将开始无休止的spins中,也就是自旋,这种自旋发生在堆栈调用从kksSearchChildList -> kkshgnc之间,在kksSearchChildList函数这里陷入到了一个无限的循环当中.也就是应用程序当中常见的死循环.
解决方法:
1.Apply patch 8575528
2.设置_cursor_features_enabled=10.
出现这个问题之后,session会话不会自动消失掉,同时停止实例也是没办法停下来的,只有手动去杀掉这些进程,数据库实例才能shutdown.
参考文档:
Session or job spins on ‘kksfbc child completion’ wait [ID 1354066.1]
Post a Comment