应用会话在ybtdb2数据库执行Truncate操作无法成功,数据库出现异常等待事件“enq: RO – fast object reuse”。出现异常等待期间数据库主机资源使用正常。为了进一步分析问题,对该进程做了errorstack,errorstack trace文件如下:
SO: 70000050fc93af8, type: 4, owner: 70000050f8d63d8, flag: INIT/-/-/0x00 (session) sid: 1103 trans: 7000004f99b71a8, creator: 70000050f8d63d8, flag: (100041) USR/- BSY/-/-/-/-/- DID: 0002-0227-00001ECA, short-term DID: 0002-0227-00001ECB txn branch: 0 oct: 170, prv: 0, sql: 7000004cd3ff508, psql: 7000004cd3ff508, user: 35/YBT service name: ybtdb O/S info: user: ybtcenter, term: , ospid: 15266, machine: ybtapp2 program: loadidsdata@ybtapp2 (TNS V1-V3) client info: 10.190.115.2 application name: loadidsdata@ybtapp2 (TNS V1-V3), hash value=227635376 waiting for 'enq: RO - fast object reuse' blocking sess=0x0 seq=15114 wait_time=0 seconds since wait started=278032 name|mode=524f0006, 2=20227, 0=1 Dumping Session Wait History for 'enq: RO - fast object reuse' count=1 wait_time=488301 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488295 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488299 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488293 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488292 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488290 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488291 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488292 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488296 name|mode=524f0006, 2=20227, 0=1 for 'enq: RO - fast object reuse' count=1 wait_time=488299 name|mode=524f0006, 2=20227, 0=1 temporary object counter: 0
可以看到该会话一直在等待enq: RO – fast object reuse,且等待时间较长。
----- PL/SQL Call Stack ----- object line object handle number name 7000004cde152e0 909 package body SYS.DBMS_SYS_SQL 7000004cde15990 39 package body SYS.DBMS_SQL 70000050e323ca8 7 procedure YBT.TRUNCATE_TABLE ----- Call Stack Trace ----- ksliwat+06c0 bl skgpwwait FFFFFFFFFFEB1F0 ? 000000000 ? 103AC4EF8 ? 000000000 ? FFFFFFFFFFEB1F0 ? kslwaitns_timed+002 bl ksliwat 000000000 ? 000000000 ? 4 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kskthbwt+022c bl kslwaitns_timed 000000004 ? 000000000 ? FFFFFFFFFFEBE20 ? 000000000 ? 3784759F126D58 ? 000000061 ? FFFFFFFFFFEB530 ? 000000000 ? kslwait+00f4 bl kskthbwt 3200000032 ? 4900000049 ? 000000000 ? 0524F0006 ? 000020227 ? 000000001 ? E00000000000E ? 6955278669552786 ? kjuscv+0ad0 bl kslwait FFFFFFFFFFEB704 ? 200000003 ? 5000001101960E8 ? 700000000 ? 000000020 ? 000000000 ? ksipcon+041c bl kjuscv 700000509346A00 ? 50000010564B0D0 ? 61FFFFFFBF ? 0FFFFFEFF ? FFFFFFFFFFEBE20 ? 000000000 ? 000000000 ? FFFFFFFFFFEBD4C ? ksqcmi+0e14 bl ksipcon 10564B8E4 ? 10564B8F0 ? 10564B928 ? 10564B914 ? 10564B908 ? 080000004 ? 10564B940 ? 10564B934 ? ksqcnv+030c bl ksqcmi 70000050B0D3DF8 ? 600000000 ? FFFF0011BD18 ? 147AE140B4A3EC8 ? FFFFFFFFFFED260 ? FFFFFFFFFFED4B0 ? 000000004 ? 110001248 ? ksqcov+0034 bl ksqcnv FFFFFFFFFFED84C ? 000000000 ? FFFFFFFFFFED844 ? 000000001 ? 000000001 ? 000020227 ? 00147AE14 ? kcbo_reuse_obj+0880 bl 01F94504 kcbrbo+00bc bl 01F93AE0 ktsstrn_segment+0f6 bl kcbrbo FFFFFFFFFFEE0C0 ? 000000000 ? c 7000004BD28E470 ? kkbtts_trunc_tbl_se bl ktsstrn_segment FFFFFFFFFFEE0C0 ? 02CC03F1D ? g+04b8 000000000 ? 0FFFEE268 ? 064232020 ? 064232020 ? kkbtrn+13f8 bl kkbtts_trunc_tbl_se 047548A48 ? 7000004BD28E318 ? g 000000000 ? 1FFFEE7A0 ? 7000004D42A5D88 ? 700000443A513F0 ? 3000000000003 ? opiexe+317c bl kkbtrn
通过观察堆栈发现是在执行Truncate语句,执行过程中采取了下列调用。
kslwaitns_timed -> kskthbwt -> kslwait -> kjuscv -> ksipcon -> ksqcmi -> ksipcon -> ksqcnv -> ksqcov -> kcbo_reuse_obj -> kcbrbo -> ktsstrn_segment -> kkbtts_trunc_tbl_se
通过metalink note进行搜索,找到了文档Process waits too long for ‘enq: RO – fast object reuse’. Cause: CKPT Process Bug 7385253 (文档 ID 785232.1),里面的文档中的Call Stack和我们的类似。文档中指出的Call Stack如下所示:
kslwaitns_timed kskthbwt kslwait ksqcmi ksqcnv ksqcov kcbo_reuse_obj kcbrbo ..etc
同时可以观察队列资源的情况。
----------resource 0x7000004c3fb9410---------------------- resname : [0x20227][0x1],[RO] GRANTED_Q : lp 70000050bce5f40 gl KJUSERCR rp 7000004c3fb9410 [0x20227][0x1],[RO] master 1 gl owner 70000050cd9d4d0 possible pid 2064446 xid 2001-0013-00000005 bast 0 rseq 15 mseq 0 history 0x95514955 open opt KJUSERDEADLOCK KJUSERIDLK CONVERT_Q: lp 700000509346a00 gl KJUSERPW rl KJUSEREX rp 7000004c3fb9410 [0x20227][0x1],[RO] master 1 gl owner 70000050fc93af8 possible pid 1159360 xid 2022-0227-00001ECA bast 0 rseq 15 mseq 0 history 0x5514955a convert opt KJUSERGETVALUE KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK
可以看到该队列和pid 2064446和pid 1159360有关。
SO: 70000050b0d3dd8, type: 5, owner: 700000481adb5a0, flag: INIT/-/-/0x00 (enqueue) RO-00020227-00000001 DID: 0002-0227-00001ECA lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x3 res: 0x70000050a15e320, mode: SSX, req: X, lock_flag: 0x0 own: 0x70000050fc93af8, sess: 0x70000050fc93af8, proc: 0x70000050f8d63d8, prv: 0x70000050a15e350 slk: 0x700000509346a00 possible owner[19.2064446] on resource RO-00020227-00000001
继续分析可以发现pid 1159360以X模式在请求持有队列RO-00020227-00000001。pid 1159360正是发起truncate的会话进程。而队列RO-00020227-00000001是被pid 2064446所持有的。而通过查询2064446发现是ckpt进程。
[jtdba@sxybtdb2 ~]$ps -ef | grep ckpt jtdba 1581064 2146438 0 14:30:52 pts/0 0:00 grep ckpt oracle 2064446 1 0 Dec 21 - 28:02 ora_ckpt_ybtdb2
问题解决
建议安装单独的patch 7385253来规避这个问题.
建议将数据库升级到11.2来规避此问题.11.2以上的版本才修复了该问题.