cpu负载过高和kksfbc child completion等待事件.

问题是这样的,服务器是hp-unix,数据库版本是10.2.0.3,检查服务器的时候用top查看,发现有几个进程cpu一直是100%.如下所示:

System: oradb2                                        Wed Apr 18 13:35:33 2012
Load averages: 0.90, 0.93, 0.93
424 processes: 344 sleeping, 80 running
Cpu states:
CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK  SWAIT   INTR   SSYS
 0    0.93  75.9%   0.0%   2.8%  21.3%   0.0%   0.0%   0.0%   0.0%
 1    0.93  83.5%   0.0%   2.2%  14.3%   0.0%   0.0%   0.0%   0.0%
 2    0.90  70.6%   0.0%   3.4%  26.0%   0.0%   0.0%   0.0%   0.0%
 3    0.90  83.7%   0.0%   3.8%  12.5%   0.0%   0.0%   0.0%   0.0%
 4    0.93  81.9%   0.0%   3.0%  15.1%   0.0%   0.0%   0.0%   0.0%
 5    0.90  69.2%   0.0%   4.8%  26.0%   0.0%   0.0%   0.0%   0.0%
 6    0.87  59.6%   0.0%   7.8%  32.6%   0.0%   0.0%   0.0%   0.0%
 7    0.87  81.3%   0.0%   4.0%  14.7%   0.0%   0.0%   0.0%   0.0%
---   ----  -----  -----  -----  -----  -----  -----  -----  -----
avg   0.90  75.7%   0.0%   4.0%  20.3%   0.0%   0.0%   0.0%   0.0%

System Page Size: 64Kbytes
Memory: 12886848K (5738368K) real, 14577152K (6735232K) virtual, 7254400K free  Page# 1/20

CPU TTY  PID USERNAME PRI NI   SIZE    RES STATE    TIME %WCPU  %CPU COMMAND
 4   ? 13319 oracle   152 20   278M   155M run   464779:36 98.45 98.28 evmd.bin
 4   ? 15425 oracle   241 20 16001M 13632K run   177954:10 97.07 96.90 oracleqxsm1
 3   ? 11033 oracle   239 20 16001M 13632K run   203549:16 96.82 96.65 oracleqxsm1
 0   ? 22968 oracle   241 20 16001M 13632K run   181274:44 96.70 96.53 oracleqxsm1
 6   ?  1776 oracle   241 20 16001M 13632K run   177366:01 94.99 94.82 oracleqxsm1
 5   ? 11710 oracle   152 20 16011M 21952K run   2094:50 44.69 44.61 oracleqxsm1
 2   ? 11985 oracle   241 20 16002M 14400K run   3741:36 20.28 20.24 oracleqxsm1
 7   ? 13101 oracle   152 20 16002M 14784K run    419:53 16.00 15.97 oracleqxsm1
 2   ? 14720 oracle   152 20 16002M 14784K run   5291:05 15.85 15.82 oracleqxsm1
 7   ? 13197 oracle   152 20 16002M 14784K run    418:00 15.73 15.71 oracleqxsm1
 4   ?  6998 oracle   154 20 16005M 17728K sleep 1668:38  4.28  4.27 ora_lmon_qxsm1
 4   ?  9867 oracle   154 20 16004M 17024K sleep 1421:01  3.83  3.82 oracleqxsm1
 3   ?  6023 oracle   154 20 16002M 14464K sleep   80:50  2.32  2.31 oracleqxsm1
 6   ?  3859 oracle   154 20 16002M 14464K sleep   41:08  2.24  2.24 oracleqxsm1
 5   ? 12623 oracle   154 20 16002M 14656K sleep   48:49  2.06  2.05 oracleqxsm1
 4   ? 21768 oracle   154 20 16002M 14720K sleep  100:10  1.89  1.89 oracleqxsm1
 4   ?  7002 oracle   152 20 16015M 27968K run   28036:37  1.84  1.84 ora_lms0_qxsm1
 4   ?  7004 oracle   152 20 16015M 27968K run   27220:29  1.63  1.62 ora_lms1_qxsm1
 6   ?    71 root     152 20 96000K 32000K run   5572:37  1.47  1.46 vxfsd
 7   ?  2021 root     152 20 43648K  6272K run   6580:11  1.33  1.33 cimprovagt
 4   ? 20951 oracle   154 20 16001M 13056K sleep    9:35  1.27  1.27 oracleqxsm1

于是我就登陆到sqlplus上查询这些会话引起的原因,一般是去看等待事件.这里出现了一个很奇怪的等待事件kksfbc child completion.如下所示:

SQL> SELECT SID,SERIAL#, USERNAME,MACHINE,PREV_SQL_ID,sql_id,status,event FROM v$session b
  2  WHERE b.paddr = (SELECT addr
  4  FROM v$process c
  5  WHERE c.spid = 15425);

 SID    SERIAL# USERNAME    MACHINE       PREV_SQL_ID   SQL_ID        STATUS   EVENT
---- ---------- --------------------------------------- ------------- -------- -----------------------
 202      17280 YDSMSRUN    server-slot10 d4ggk0agny9tf a14cxxh70u8hy ACTIVE   kksfbc child completion

这种奇怪的问题也只好上MOS上寻找答案了.查到了这是bug 6795880造成的.当产生一个kksfbc child completion的等待之后,这个会话将开始无休止的spins中,也就是自旋,这种自旋发生在堆栈调用从kksSearchChildList -> kkshgnc之间,在kksSearchChildList函数这里陷入到了一个无限的循环当中.也就是应用程序当中常见的死循环.

解决方法:

1.Apply patch 8575528

2.设置_cursor_features_enabled=10.

出现这个问题之后,session会话不会自动消失掉,同时停止实例也是没办法停下来的,只有手动去杀掉这些进程,数据库实例才能shutdown.

参考文档:

Session or job spins on ‘kksfbc child completion’ wait [ID 1354066.1]

分享到: 更多

Post a Comment

Your email is never published nor shared. Required fields are marked *