查询v$asm_diakgroup一直hang,等待事件enq: DD – contention.

今天同事发过来一个问题,在asm实例中执行查询v$asm_diskgroup的时候,session就hang住了。查询等待事件发现是enq: DD – contention。这个情况还是第一次遇到,不是很清楚,于是让他收集了一些trace发过来。对于hang的问题,我们一般建议做3级的hanganalyze。从hanganalyze的trace file文件中我们可以看到下列的一些信息:

Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'kfk: async disk IO'<='enq: DD - contention'
     Chain 1 Signature Hash: 0x92d661bb
 [b] Chain 2 Signature: 'kfk: async disk IO'<='enq: DD - contention'
     Chain 2 Signature Hash: 0x92d661bb
 [c] Chain 3 Signature: 'kfk: async disk IO'<='enq: DD - contention'
     Chain 3 Signature Hash: 0x92d661bb
 
===============================================================================
Non-intersecting chains:
 
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 4 (+asm.+asm4)
                   os id: 12115
              process id: 39, oracle@bi-db04 (TNS V1-V3)
              session id: 961
        session serial #: 5
    }
    is waiting for 'enq: DD - contention' with wait info:
    {
                      p1: 'name|mode'=0x44440006
                      p2: 'disk group'=0x0
                      p3: 'type'=0x1
            time in wait: 24.505926 sec
           timeout after: never
                 wait id: 9
                blocking: 0 sessions
             current sql: select group_number,name,total_mb,free_mb from v$asm_diskgroup
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+1865<-kslwaitctx()+163<-ksqcmi()+6145<-ksqgtlctx()+3488<-ksqgelctx()+569<-kfgUseDmt()+649<-kfgTableCb()+1550<-kfgGrpTableCbInternal()+4169<-kfgGrpTableCb()+56<-qerfxFetch()+2210<-opifch2()+3022<-kpoal8()+2939<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<-__libc_start_main()+230<-_start()+36
            wait history:
              * time between current wait and wait #1: 0.000236 sec
              1.       event: 'SQL*Net message to client'
                 time waited: 0.000003 sec
                     wait id: 8               p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.001586 sec
              2.       event: 'SQL*Net message from client'
                 time waited: 32.863016 sec
                     wait id: 7               p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000002 sec
              3.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 6               p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 4 (+asm.+asm4)
                   os id: 25609
              process id: 31, oracle@bi-db04 (TNS V1-V3)
              session id: 450
        session serial #: 2057
    }
    which is waiting for 'kfk: async disk IO' with wait info:
    {
                      p1: 'count'=0x1
                      p2: 'intr'=0x0
                      p3: 'timeout'=0xffffffff
            time in wait: 9412 min 34 sec
      heur. time in wait: 9412 min 34 sec
           timeout after: never
                 wait id: 83
                blocking: 7 sessions
             current sql: select grpnum_kfdsk, number_kfdsk, compound_kfdsk, incarn_kfdsk, mntsts_kfdsk, hdrsts_kfdsk, compat_kfdsk, mode_kfdsk, state_kfdsk, redun_kfdsk, libnam_kfdsk, totmb_kfdsk, usedmb_kfdsk, asmname_kfdsk, failname_kfdsk, label_kfdsk, path_kfdsk, udid_kfdsk, kfkid_kfdsk, crdate_kfdsk, mtdate_kfdsk, timer_kfdsk , dbcompat_k
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-io_queue_run()+100<-skgfospo()+185<-skgfrwat()+399<-ksfdwtio()+266<-ksfdwat_internal()+185<-kfk_reap_ufs_async_io()+101<-kfk_reap_ios_from_subsys()+297<-kfk_reap_ios()+2038<-kfk_io1()+1108<-kfkRequest()+14<-kfk_transitIO()+1696<-kfdParallelIO()+2075<-kfdDiscoverDeep()+199<-kfgDiscoverDeep()+371<-kfgDiscoverGroup()+869<-kfgTableCb()+1567<-kfdDskTableCbInternal()+233<-kfdDskTableCb()+56<-qerfxFetch()+2210<-opifch2()+3022<-kpoal8()+2939<-opio
            wait history:
              * time between current wait and wait #1: 0.000029 sec
              1.       event: 'kfk: async disk IO'
                 time waited: 0.010643 sec
                     wait id: 82              p1: 'count'=0x1
                                              p2: 'intr'=0x0
                                              p3: 'timeout'=0xffffffff
              * time between wait #1 and #2: 0.000028 sec
              2.       event: 'kfk: async disk IO'
                 time waited: 0.004111 sec
                     wait id: 81              p1: 'count'=0x1
                                              p2: 'intr'=0x0
                                              p3: 'timeout'=0xffffffff
              * time between wait #2 and #3: 0.000028 sec
              3.       event: 'kfk: async disk IO'
                 time waited: 0.001966 sec
                     wait id: 80              p1: 'count'=0x1
                                              p2: 'intr'=0x0
                                              p3: 'timeout'=0xffffffff
    }
 
Chain 1 Signature: 'kfk: async disk IO'<='enq: DD - contention'
Chain 1 Signature Hash: 0x92d661bb

这里我们可以看到我们执行查询v$asm_diskgroup的session当前的等待事件是enq: DD – contention,而这个会话被另外一个会话所阻塞。这里我们可以看到是这个session的等待事件是kfk: async disk IO。而且一直处于这个等待上。现在得依靠我们强大的metalink了,我们上去搜索了一番,发现Bug 14002325 : SELECT * FROM V$ASM_DISKGROUP HANGING.它的描述是比较符合我们这个案例的。为什么这么说呢?我们可以把我们的堆栈拿出来和bug的堆栈做对比。这是bug提供的堆栈信息。

 Short stack dump:
 
ksedsts ksdxfstk ksdxcb sspuser __sighandler read kfk_reap_ios_from_subsys 
kfk_reap_ios kfk_io1 kfkRequest kfk_transitIO kfdParallelIO kfdDiscoverDeep 
kfgDiscoverDeep kfgDiscoverGroup kfgTableCb kfgGrpTableCbInternal 
kfgGrpTableCb qerfxFetch opifch2 kpoal8 opiodr ttcpip opitsk opiino opiodr 
opidrv sou2o opimai_real ssthrdmain main __libc_start_main _start

因为我们做的hanganalyze的short stack不全,于是我让他又做了一个完整的errorstack。内容如下。

----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=b8a2pdbq3p2mj) -----
select grpnum_kfdsk, number_kfdsk, compound_kfdsk, incarn_kfdsk, mntsts_kfdsk, hdrsts_kfdsk, compat_kfdsk, mode_kfdsk, state_kfdsk, redun_kfdsk, libnam_kfdsk, totmb_kfdsk, usedmb_kfdsk, asmname_kfdsk, failname_kfdsk, label_kfdsk, path_kfdsk, udid_kfdsk, kfkid_kfdsk, crdate_kfdsk, mtdate_kfdsk, timer_kfdsk , dbcompat_kfdsk,  product_kfdsk , osmb_kfdsk, prefrd_kfdsk , coldmb_kfdsk, hotmb_kfdsk, blksz_kfdsk, flags_kfdsk ,asmhashval_kfdsk from x$kfdsk

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFF42C57A68 ? 000000001 ?
                                                   000000001 ? 000000003 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFF42C57A68 ? 000000001 ?
                                                   000000000 ? 000000003 ?
ksedst()+34          call     ksedst1()            000000001 ? 000000001 ?
                                                   7FFF42C57A68 ? 000000001 ?
                                                   000000000 ? 000000003 ?
dbkedDefDump()+2741  call     ksedst()             000000001 ? 000000001 ?
                                                   7FFF42C57A68 ? 000000001 ?
                                                   000000000 ? 000000003 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000000 ?
                                                   7FFF42C57A68 ? 000000001 ?
                                                   000000000 ? 000000003 ?
ksdxfdmp()+1837      call     ksedmp()             000000003 ? 000000000 ?
                                                   7FFF42C57A68 ? 000000001 ?
                                                   000000000 ? 000000003 ?
ksdxcb()+1876        call     ksdxfdmp()           7FFF42C5D360 ? 000000011 ?
                                                   000000003 ? 7FFF42C5D2C0 ?
                                                   7FFF42C5D220 ? 000000003 ?
sspuser()+112        call     ksdxcb()             000000001 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FFF42C5D220 ? 000000003 ?
__sighandler()       call     sspuser()            000000001 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FFF42C5D220 ? 000000003 ?
io_queue_run()+100   signal   __sighandler()       7F194F920000 ? 000000001 ?
                                                   000000080 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   7FFF42C5EFB0 ? 7FFF42C5EFB0 ?
skgfrliopo()+273     call     io_queue_run()       7F194F920000 ? 000000001 ?
                                                   000000080 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   7FFF42C5EFB0 ? 7FFF42C5EFB0 ?
skgfospo()+180       call     skgfrliopo()         7FFF42C5F178 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   7FFF42C5EFB0 ? 7FFF42C5EFB0 ?
                                                   000000000 ? 0FFFFFFFF ?
skgfrwat()+394       call     skgfospo()           7FFF42C5EFB0 ? 7FFF42C5EFB0 ?
                                                   7F194CE32038 ? 000000000 ?
                                                   000000001 ? 0FFFFFFFF ?
ksfdwtio()+261       call     skgfrwat()           7FFF42C5F178 ? 7F194CE32038 ?
                                                   000000001 ? 7FFF42C5F1A8 ?
                                                   7FFF42C5DFA8 ? 0FFFFFFFF ?
ksfdwat_internal()+  call     ksfdwtio()           000000001 ? 000000028 ?
180                                                07FFFFFFF ? 7FFF42C5F2F8 ?
                                                   7FFF42C5DFA8 ? 0FFFFFFFF ?
kfk_reap_ufs_async_  call     ksfdwat_internal()   000000001 ? 000000028 ?
io()+96                                            07FFFFFFF ? 7FFF42C5F2F8 ?
                                                   7FFF42C5DFA8 ? 0FFFFFFFF ?
kfk_reap_ios_from_s  call     kfk_reap_ufs_async_  000000001 ? 000000028 ?
ubsys()+292                   io()                 7FFF42C5DFA8 ? 7FFF42C5F2F8 ?
                                                   7FFF42C5DFA8 ? 0FFFFFFFF ?
kfk_reap_ios()+2033  call     kfk_reap_ios_from_s  000000001 ? 07FFFFFFF ?
                              ubsys()              000000001 ? 7FFF42C5F2F8 ?
                                                   7FFF42C5DFA8 ? 0FFFFFFFF ?
kfk_io1()+1103       call     kfk_reap_ios()       7F194CE48938 ? 00000000A ?
                                                   0FFFFFFFF ? 7FFF42C5F2F8 ?
                                                   7FFF42C5DFA8 ? 0FFFFFFFF ?
kfkRequest()+9       call     kfk_io1()            7F194CE48938 ? 00000000A ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000001 ? 7FFF42C5DFA8 ?
                                                   00A7B4DD0 ?
kfk_transitIO()+169  call     kfkRequest()         7F194CE48938 ? 00000000A ?
1                                                  FFFFFFFFFFFFFFFF ?
                                                   000000001 ? 7FFF42C5DFA8 ?
                                                   00A7B4DD0 ?
kfdParallelIO()+207  call     kfk_transitIO()      7FFF42C603D0 ? 000000000 ?
0                                                  000000000 ? 000000000 ?
                                                   7FFF42C60380 ? 00000000A ?
kfdDiscoverDeep()+1  call     kfdParallelIO()      00000000A ? 000000000 ?
94                                                 000000000 ? 000000000 ?
                                                   000000000 ? 7FFF42C60558 ?
kfgDiscoverDeep()+3  call     kfdDiscoverDeep()    0D1B9BF30 ? 000000000 ?
66                                                 000000000 ? 000000000 ?
                                                   000000000 ? 7FFF42C60558 ?
kfgDiscoverGroup()+  call     kfgDiscoverDeep()    000000000 ? 000000000 ?
864                                                000000000 ? 000000000 ?
                                                   000000000 ? 7FFF42C60558 ?
kfgTableCb()+1562    call     kfgDiscoverGroup()   7F194CB7FD88 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0D1B9BF30 ? 000000000 ?
kfdDskTableCbIntern  call     kfgTableCb()         7F194CBA9E58 ? 7FFF00000001 ?
al()+228                                           7F194CB9AAA0 ? 000000000 ?
                                                   000000007 ? 7F1900000000 ?
kfdDskTableCb()+51   call     kfdDskTableCbIntern  7F194CBA9E58 ? 7FFF00000001 ?
                              al()                 000000007 ? 7F1900000000 ?
                                                   0CAF615A0 ? 000000001 ?
qerfxFetch()+2210    call     kfdDskTableCb()      7F194CBA9E58 ? 7FFF00000001 ?
                                                   000000007 ? 7F1900000000 ?
                                                   0CAF615A0 ? 000000001 ?
opifch2()+3022       call     qerfxFetch()         0CAF63260 ? 7F194CBA9AD0 ?
                                                   0081D2D48 ? 7FFF42C60D90 ?
                                                   000000001 ? 000000001 ?
kpoal8()+2934        call     opifch2()            000000089 ? 000000005 ?
                                                   7FFF42C613B0 ? 7F194CBA8F90 ?
                                                   000000001 ? 000000001 ?
opiodr()+916         call     kpoal8()             00000005E ? 00000001C ?
                                                   7FFF42C64310 ? 7F194CBA8F90 ?
                                                   000000001 ? 100000001 ?
ttcpip()+2242        call     opiodr()             00000005E ? 00000001C ?
                                                   7FFF42C64310 ? 000000000 ?
                                                   008BE8B30 ? 100000001 ?
opitsk()+1668        call     ttcpip()             00A7CA8B0 ? 00852A310 ?
                                                   7FFF42C64310 ? 000000000 ?
                                                   7FFF42C63D68 ? 7FFF42C64504 ?
opiino()+961         call     opitsk()             00A7CA8B0 ? 000000001 ?
                                                   7FFF42C64310 ? 000000000 ?
                                                   7FFF42C63D68 ? 7FFF42C64504 ?
opiodr()+916         call     opiino()             00000003C ? 000000004 ?
                                                   7FFF42C65A88 ? 000000000 ?
                                                   7FFF42C63D68 ? 7FFF42C64504 ?
opidrv()+565         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFF42C65A88 ? 000000000 ?
                                                   008BE85E0 ? 7FFF42C64504 ?
sou2o()+98           call     opidrv()             00000003C ? 000000004 ?
                                                   7FFF42C65A88 ? 000000000 ?
                                                   008BE85E0 ? 7FFF42C64504 ?
opimai_real()+128    call     sou2o()              7FFF42C65A60 ? 00000003C ?
                                                   000000004 ? 7FFF42C65A88 ?
                                                   008BE85E0 ? 7FFF42C64504 ?
ssthrdmain()+252     call     opimai_real()        000000002 ? 7FFF42C65C50 ?
                                                   000000004 ? 7FFF42C65A88 ?
                                                   008BE85E0 ? 7FFF42C64504 ?
main()+196           call     ssthrdmain()         000000002 ? 7FFF42C65C50 ?
                                                   000000001 ? 000000000 ?
                                                   008BE85E0 ? 7FFF42C64504 ?
__libc_start_main()  call     main()               000000002 ? 7FFF42C65DE8 ?
+230                                               000000001 ? 000000000 ?
                                                   008BE85E0 ? 7FFF42C64504 ?
_start()+36          call     __libc_start_main()  0008BA95C ? 000000002 ?
                                                   7FFF42C65DD8 ? 0038087C0 ?
                                                   008BE85E0 ? 000000002 ?
 

可以看到堆栈的内容一摸一样,和Bug 14002325 : SELECT * FROM V$ASM_DISKGROUP HANGING.但是遗憾的是这篇文档没有告诉我们形成的原因,也没有告诉我们解决的办法。它只说了这是一个bug。面对这样的情况,我们能做的唯一办法就是规避。那么怎么规避这个问题呢?那么我们需要了解下原理。

首先当我们查询v$asm_disk和v$asm_diskgroup这两个视图的时候,ASM实例会启动pzNN进程,该进程会去和gpnp的守护进程进行通信。这一步主要的目的是从gpnp的profile文件当中获取到discovery string。然后根绝这个discovery string去查活动的disk。我们在执行这个查找discovery string的操作的时候,将会把这些信息记录到RBAL的trace文件里面。据Oracle称是为了出于bebug的目的。要想规避这个问题,我们可以查询v$asm_disk_stat和v$asm_diskgroup_stat这两个视图,查询这两个视图的时候不需要去和gpnp守护进程通信,也不用去查discovery_string。这样我们就规避了这个问题。我们再看一下这两个视图的定义。

SQL> desc v$asm_diskgroup
 Name                                                                                               Null?    Type
 -------------------------------------------------------------------------------------------------- -------- -------
 GROUP_NUMBER                                                                                                NUMBER
 NAME                                                                                                        VARCHAR2(30)
 SECTOR_SIZE                                                                                                 NUMBER
 BLOCK_SIZE                                                                                                  NUMBER
 ALLOCATION_UNIT_SIZE                                                                                        NUMBER
 STATE                                                                                                       VARCHAR2(11)
 TYPE                                                                                                        VARCHAR2(6)
 TOTAL_MB                                                                                                    NUMBER
 FREE_MB                                                                                                     NUMBER
 HOT_USED_MB                                                                                                 NUMBER
 COLD_USED_MB                                                                                                NUMBER
 REQUIRED_MIRROR_FREE_MB                                                                                     NUMBER
 USABLE_FILE_MB                                                                                              NUMBER
 OFFLINE_DISKS                                                                                               NUMBER
 COMPATIBILITY                                                                                               VARCHAR2(60)
 DATABASE_COMPATIBILITY                                                                                      VARCHAR2(60)
 VOTING_FILES                                                                                                VARCHAR2(1)

SQL> desc v$asm_diskgroup_stat
 Name                                                                                               Null?    Type
 -------------------------------------------------------------------------------------------------- -------  -------
 GROUP_NUMBER                                                                                                NUMBER
 NAME                                                                                                        VARCHAR2(30)
 SECTOR_SIZE                                                                                                 NUMBER
 BLOCK_SIZE                                                                                                  NUMBER
 ALLOCATION_UNIT_SIZE                                                                                        NUMBER
 STATE                                                                                                       VARCHAR2(11)
 TYPE                                                                                                        VARCHAR2(6)
 TOTAL_MB                                                                                                    NUMBER
 FREE_MB                                                                                                     NUMBER
 HOT_USED_MB                                                                                                 NUMBER
 COLD_USED_MB                                                                                                NUMBER
 REQUIRED_MIRROR_FREE_MB                                                                                     NUMBER
 USABLE_FILE_MB                                                                                              NUMBER
 OFFLINE_DISKS                                                                                               NUMBER
 COMPATIBILITY                                                                                               VARCHAR2(60)
 DATABASE_COMPATIBILITY                                                                                      VARCHAR2(60)
 VOTING_FILES                                                                                                VARCHAR2(1)

SQL> select * from v$FIXED_VIEW_DEFINITION where view_name ='GV$ASM_DISKGROUP';    

VIEW_NAME            VIEW_DEFINITION
-----------------    -----------------------------------------------------------------------------------------------------------------------------------------------
GV$ASM_DISKGROUP
select   g.inst_id, g.number_kfgrp, g.name_kfgrp,   g.sector_kfgrp, g.blksize_kfgrp, g.ausize_kfgrp,   decode(g.state_kfgrp,     0, 'INVALID', 1, 'UNKNOWN', 2, 'DISMOUNTED', 3
, 'CREATING',     4, 'MOUNTING', 5, decode(bitand(g.lflags_kfgrp, 1), 1,        'QUIESCING', 'MOUNTED'),     6, 'DISMOUNTING', 7, 'CONNECTED',     8, 'BROKEN', 9, 'CONNECTING'
, 10, 'BREAKING', 11, 'DROPPING',     12, 'DROPPING', 255, 'RESTRICTED'),   decode(g.type_kfgrp, 1, 'EXTERN', 2, 'NORMAL', 3, 'HIGH'),   g.totmb_kfgrp, g.freemb_kfgrp, g.hotmb
_kfgrp, g.coldmb_kfgrp,   g.minspc_kfgrp, g.usable_kfgrp, g.offline_kfgrp,   g.compat_kfgrp, g.dbcompat_kfgrp,   decode(bitand(g.flags32_kfgrp, 2048), 2048, 'Y', 'N')   from x
$kfgrp g where state_kfgrp != 0


SQL> select * from v$FIXED_VIEW_DEFINITION where view_name ='GV$ASM_DISKGROUP_STAT';

VIEW_NAME            VIEW_DEFINITION
-----------------    ---------------------------------------------------------------------------------------------------------------------------------------------
GV$ASM_DISKGROUP_STAT
select   g.inst_id, g.number_kfgrp, g.name_kfgrp,   g.sector_kfgrp, g.blksize_kfgrp, g.ausize_kfgrp,   decode(g.state_kfgrp,     0, 'INVALID', 1, 'UNKNOWN', 2, 'DISMOUNTED', 3
, 'CREATING',     4, 'MOUNTING', 5, decode(bitand(g.lflags_kfgrp, 1), 1,        'QUIESCING', 'MOUNTED'),     6, 'DISMOUNTING', 7, 'CONNECTED',     8, 'BROKEN', 9, 'CONNECTING'
, 10, 'BREAKING', 11, 'DROPPING',     12, 'DROPPING', 255, 'RESTRICTED'),   decode(g.type_kfgrp, 1, 'EXTERN', 2, 'NORMAL', 3, 'HIGH'),   g.totmb_kfgrp, g.freemb_kfgrp, g.hotmb
_kfgrp, g.coldmb_kfgrp,   g.minspc_kfgrp, g.usable_kfgrp, g.offline_kfgrp,   g.compat_kfgrp, g.dbcompat_kfgrp,   decode(bitand(g.flags32_kfgrp, 2048), 2048, 'Y', 'N')   from x
$kfgrp_stat g where state_kfgrp != 0 and number_kfgrp != 0

可以看到这两个视图的字段是一摸一样的。而它们查询的基表一个是x$kfgrp,另外一个是x$kfgrp_stat.

分享到: 更多