客户一套11.2.0.3的RAC数据库系统,运行一些很简单的查询语句,发现速度超级慢。然后进行查询数据字典的操作,也慢的难以忍受。这套库所在的机器配置很好,内存有100多个G,按照道理来说查询一个数据字典不能慢的如此离谱。大概50多秒才能出来。于是对数据字典进行了统计信息收集,发现速度依旧很慢。把语句的执行计划拿出来和正常数据库的执行计划进行比较,执行计划不变。最后通过10046进行观察,发现执行慢的过程中,出现大量的db file scattered read和db file sequential read等待事件。也就是在该数据库上执行查数据字典,物理读很高。这就非常值得怀疑了。看了一下buffer cache的当前大小,发现只有3G。而我们的数据库启动参数设置的buffer cache是50多G,查看alert日志,发现并没有对SGA组件大小的操作。这里需要介绍一下:该数据库memory_target,memory_max_targe,sga_target都设置为0。也就是禁用了11g AMM的特性。在SGA上对各个组件进行了详细的设置。这是一个完全禁用了自动内存管理的数据库。那么疑问是:既然设置了50G的buffer cache怎么只有3G大小。剩下的内存去哪儿了?通过检查发现Shared pool增长到了60G。
POOL NAME BYTES ------------ -------------------------- ---------- java pool free memory 1879048192 large pool free memory 1879048192 shared pool kscdnfyinitflags 8 shared pool parameter table block 22068760 shared pool SQLP 23932720 shared pool KQR L PO 26014192 shared pool FileIdentificatonBlock 30523832 shared pool KGLNA 31756456 shared pool gcs res hash bucket 33554432 shared pool dirty object counts array 33554432 shared pool dbwriter coalesce buffer 33587200 shared pool ksunfy : SSO free list 37453824 shared pool ges big msg buffers 39921208 shared pool event statistics per sess 40108032 shared pool KKSSP 57755752 shared pool KGLDA 58894136 shared pool object queue 101699520 shared pool ges resource 107943072 shared pool ASH buffers 134217728 shared pool dbktb: trace buffer 139575296 shared pool ges enqueues 151274072 shared pool gc name table 201326592 shared pool db_block_hash_buckets 373297152 shared pool state objects 506913016 shared pool Checkpoint queue 536903680 shared pool kglsim heap 549089856 shared pool FileOpenBlock 597301440 shared pool gcs shadows 819820736 shared pool kglsim object batch 1040722032 shared pool gcs resources 1112613856 shared pool SQLA 1129550336 shared pool KGLH0 1256899928 shared pool free memory 1.1934E+10 shared pool KGLHD 4.2610E+10 streams pool spilled:kwqbm 176 streams pool deqtree_kgqmctx 304 streams pool recov_kgqmsub 315032 streams pool free memory 267720696 fixed_sga 2228392 log_buffer 170323968 buffer_cache 3489660928
通过观察,我们发现shared pool的KGLHD增长到了一个很大的值。达到了42个G。所以留给我们的疑问是:
1.在禁用了SGA自动调整后,为什么buffer cache会变小,而shared pool会变大?
2.为什么shared pool的KGLHD会增长到40多G?
要解释第一个问题,我们可以从文档SGA Re-Sizes Occurring Despite AMM/ASMM Being Disabled (MEMORY_TARGET/SGA_TARGET=0) [ID 1269139.1]上找到答案。这里面有一段话如下:
SGA re-sizes are occurring after upgrade to 11.2 despite the fact that automatic memory management (AMM/ASMM) is disabled via the MEMORY_TARGET and SGA_TARGET parameters being set to zero. This typically appears as growth in the __SHARED_POOL_SIZE value and a reduction in the __DB_CACHE_SIZE value being used in the instance, such that __DB_CACHE_SIZE may be shrunk below the DB_CACHE_SIZE value specified in the init.ora/spfile. This is expected behavior in 11.2 for immediate memory allocation requests, which added this as a new feature when automatic memory management was disabled. Enhancement Bug:13340694 has been logged to provide a warning message in the alertlog when such allocations occur to provide the DBA with an indication of a memory problem, as otherwise this can result in the buffer cache being shrunk below the specified minimum value, and hence lead to performance degradation. Ignore the resize operation which is performed to prevent an ORA-4031 error from being raised.
这里告诉我们,在11.2以后,数据库发现如果shared pool不够用,会爆发ORA-04031错误的话,它就会去“偷下”buffer cache的内存。防止ORA-04031问题的产生。虽然Oracle设想的比较好,但是我们发现如果出现Bug导致shared pool使用暴增的话,会一直”偷取”buffer cache的内存,最终导致buffer cache严重不足,引发更为严重的I/O问题。
这里不得不说下Oracle的巧妙设计,当我们设置MEMORY_TARGET和SGA_TARGET参数为0的时候,SGA自动调整将禁用DEFERRED mode的自动调整请求,但是它允许IMMEDIATE mode的自动请求,那么我们怎么分辨请求是DEFERRED mode or IMMEDIATE mode呢?通过查询V$MEMORY_RESIZE_OPS和V$SGA_RESIZE_OPS视图。
SQL> select * from V$MEMORY_RESIZE_OPS; COMPONENT OPER_TYPE OPER_MODE PARAMETER INITIAL_SIZE TARGET_SIZE FINAL_SIZE STATUS START_TIM END_TIME -------------------------------------- --------- ---------------------------------- ----------- ---------- --------- --------- --------- shared pool STATIC shared_pool_size 0 255852544 255852544 COMPLETE 08-JUN-13 08-JUN-13 RECYCLE buffer cache STATIC db_recycle_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 PGA Target STATIC pga_aggregate_target 0 293601280 293601280 COMPLETE 08-JUN-13 08-JUN-13 large pool STATIC large_pool_size 0 4194304 4194304 COMPLETE 08-JUN-13 08-JUN-13 java pool STATIC java_pool_size 0 4194304 4194304 COMPLETE 08-JUN-13 08-JUN-13 streams pool STATIC streams_pool_size 0 4194304 4194304 COMPLETE 08-JUN-13 08-JUN-13 SGA Target STATIC sga_target 0 549453824 549453824 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT buffer cache INITIALIZING db_cache_size 272629760 272629760 272629760 COMPLETE 08-JUN-13 08-JUN-13 ASM Buffer Cache STATIC db_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT buffer cache STATIC db_cache_size 0 272629760 272629760 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT 2K buffer cache STATIC db_2k_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT 4K buffer cache STATIC db_4k_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT 8K buffer cache STATIC db_8k_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT 16K buffer cache STATIC db_16k_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT 32K buffer cache STATIC db_32k_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 KEEP buffer cache STATIC db_keep_cache_size 0 0 0 COMPLETE 08-JUN-13 08-JUN-13 DEFAULT buffer cache SHRINK DEFERRED db_cache_size 272629760 260046848 260046848 COMPLETE 09-JUN-13 09-JUN-13 shared pool GROW DEFERRED shared_pool_size 255852544 268435456 268435456 COMPLETE 09-JUN-13 09-JUN-13
这里我自己的测试机,我们可以看到后面两条信息,buffer cache拿出来了一些内存划到了shared pool里面。不过是DEFERRED模式的。因为我设置了memory_target的值,所以这种划是可以的。如果我们把其设置成0的话,就只能支持immediate模式的动态调整。
那么我们怎么阻止这种immediate模式的自动调整呢?这里MOS给出了一个隐含参数:_memory_imm_mode_without_autosga。通过设置该参数为false,我们就能够彻底的阻止immediate模式的SGA自动调整。
alter system set "_memory_imm_mode_without_autosga"=FALSE scope=both;
搞清楚了第一个问题,我们来看第二个问题,为什么我们shared pool中的KGLHD会增长到40多G?我们可以从文档High memory allocations of KGLHD in Shared Pool [ID 1435186.1]找到答案。这个问题是个bug,bug编号是:13250244。
Kgl handles and associated shared memory remained unfreed after flushing the library cache in cases of multiple shared pool subpools. The problem is easily triggered if stored outlines are used, but the use of stored outlines is not required to hit this problem.
产生这个bug的可能原因有两种:
1.使用了多个shared pool subpools,然后在flush library cache之后,Kgl handles和相关联的共享内存不释放。
2.使用了outline.在我们的环境中,我们并没有使用outline。
所以我们几乎可以断定是第一种原因引起的。所以正是因为这些内存不释放,导致Oracle认为将要出现ORA-04031错误,才会去偷buffer cache的内存。Oracle也给出两条SQL语句,让我们去监控KGLHD和outline引起的内存增长。如果我们使用了outline,我们将会看到kglhdnsp中类型为22的将高速的增长。
select count(*), kglhdnsp from x$kglob group by kglhdnsp order by kglhdnsp; select inst_id, bytes/1024/1024 KGLHD from gv$sgastat where name='KGLHD' and pool='shared pool';
最终Oracle给出的workground是:
1.设置”_kghdsidx_count”=1,这样就只需要使用一个shared pool子池。但是这么设置之后,在高并发的环境下面,会引发一些shared pool和library cache latch的一些争用。其实为了保险起见,我建议先把这个参数往下调整,不要直接设置成1。然后慢慢观察。
2.使用11g新出的SPM来代替outline.
到此该问题已经水落石出。我们找到了根本的原因,我们只需要把_kghdsidx_count参数适当进行调整,避免KGLHD持续占用内存。就能有效的控制shared pool对buffer cache的“偷取”。
参考文档:
SGA Re-Sizes Occurring Despite AMM/ASMM Being Disabled (MEMORY_TARGET/SGA_TARGET=0) [ID 1269139.1]
High memory allocations of KGLHD in Shared Pool [ID 1435186.1]
Post a Comment