10.2.0.5升级运行utlrp.sql长时间Hang的一例

给一个客户做Oracle 10g升级,最后一步运行utlrp.sql编译无效对象的时候hang住了,而且hang了很久,开始以为等一会就没有事了,后来查询无效对象的个数一直都没有发生变化,于是就开始着手寻找原因.

1.查询v$session,查看等待事件和SQL_ID.在这之前我查看了我运行utlrp.sql的sid.

SQL> select sid,event,sql_id from v$session;

       SID EVENT                                                            SQL_ID
---------- ---------------------------------------------------------------- -------------
       463 jobq slave wait
       481 jobq slave wait
       509 jobq slave wait
       515 jobq slave wait
       523 Streams AQ: qmn slave idle wait
       524 os thread startup                                                73hrt07s8adn5
       525 Streams AQ: waiting for time management or cleanup tasks         4gd6b1r53yt88
       526 jobq slave wait
       534 Streams AQ: qmn coordinator idle wait
       536 SQL*Net message to client                                        2h0tzf68rbxzz
       537 rdbms ipc message
       539 rdbms ipc message                                                4gd6b1r53yt88
       540 rdbms ipc message
       541 rdbms ipc message
       542 smon timer
       543 rdbms ipc message
       544 rdbms ipc message
       545 rdbms ipc message
       546 rdbms ipc message
       547 rdbms ipc message
       548 rdbms ipc message
       549 rdbms ipc message
       550 rdbms ipc message
       551 rdbms ipc message
       552 rdbms ipc message
       553 rdbms ipc message
       554 rdbms ipc message
       555 pmon timer

28 rows selected.

2.根据sql_id来查询正在执行的SQL语法,此时发现正在创建一个索引,而且并行度是128.开始怀疑是并行度的问题.

SQL> select SQL_FULLTEXT from v$sql where sql_id='73hrt07s8adn5';

SQL_FULLTEXT
--------------------------------------------------------------------------------
CREATE INDEX utl_recomp_comp_idx1 ON utl_recomp_compiled(obj#) PARALLEL 128

3.查看parallel_max_servers参数,发现设置的值很大.修改完参数为4之后,再次运行1分钟就结束了.

SQL> show parameter parallel_max_servers

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
parallel_max_servers                 integer     485
SQL>

SQL> alter system set parallel_max_servers=4 scope=both;

System altered.

SQL>
SQL> @?/rdbms/admin/utlrp.sql

TIMESTAMP
--------------------------------------------------------------------------------
COMP_TIMESTAMP UTLRP_BGN  2012-03-06 22:04:37

DOC>   The following PL/SQL block invokes UTL_RECOMP to recompile invalid
DOC>   objects in the database. Recompilation time is proportional to the
DOC>   number of invalid objects in the database, so this command may take
DOC>   a long time to execute on a database with a large number of invalid
DOC>   objects.
DOC>
DOC>   Use the following queries to track recompilation progress:
DOC>
DOC>   1. Query returning the number of invalid objects remaining. This
DOC>      number should decrease with time.
DOC>         SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6);
DOC>
DOC>   2. Query returning the number of objects compiled so far. This number
DOC>      should increase with time.
DOC>         SELECT COUNT(*) FROM UTL_RECOMP_COMPILED;
DOC>
DOC>   This script automatically chooses serial or parallel recompilation
DOC>   based on the number of CPUs available (parameter cpu_count) multiplied
DOC>   by the number of threads per CPU (parameter parallel_threads_per_cpu).
DOC>   On RAC, this number is added across all RAC nodes.
DOC>
DOC>   UTL_RECOMP uses DBMS_SCHEDULER to create jobs for parallel
DOC>   recompilation. Jobs are created without instance affinity so that they
DOC>   can migrate across RAC nodes. Use the following queries to verify
DOC>   whether UTL_RECOMP jobs are being created and run correctly:
DOC>
DOC>   1. Query showing jobs created by UTL_RECOMP
DOC>         SELECT job_name FROM dba_scheduler_jobs
DOC>            WHERE job_name like 'UTL_RECOMP_SLAVE_%';
DOC>
DOC>   2. Query showing UTL_RECOMP jobs that are running
DOC>         SELECT job_name FROM dba_scheduler_running_jobs
DOC>            WHERE job_name like 'UTL_RECOMP_SLAVE_%';
DOC>#

PL/SQL procedure successfully completed.

TIMESTAMP
--------------------------------------------------------------------------------
COMP_TIMESTAMP UTLRP_END  2012-03-06 22:05:06

PL/SQL procedure successfully completed.

DOC> The following query reports the number of objects that have compiled
DOC> with errors (objects that compile with errors have status set to 3 in
DOC> obj$). If the number is higher than expected, please examine the error
DOC> messages reported with each object (using SHOW ERRORS) to see if they
DOC> point to system misconfiguration or resource constraints that must be
DOC> fixed before attempting to recompile these objects.
DOC>#

OBJECTS WITH ERRORS
-------------------
                  0

DOC> The following query reports the number of errors caught during
DOC> recompilation. If this number is non-zero, please query the error
DOC> messages in the table UTL_RECOMP_ERRORS to see if any of these errors
DOC> are due to misconfiguration or resource constraints that must be
DOC> fixed before objects can compile successfully.
DOC>#

ERRORS DURING RECOMPILATION
---------------------------
                          0

PL/SQL procedure successfully completed.

这里因为客户将parallel_max_servers参数设置成485才导致出现了执行缓慢.在这里介绍一下这个参数.parallel_max_servers是指设置并行执行可以使用的最大进程数量.根据官方文档里面的公式介绍:

PARALLEL_MAX_SERVERS CPU_COUNT x PARALLEL_THREADS_PER_CPU x (2 if PGA_AGGREGATE_TARGET > 0; otherwise 1) x 5

上面的公式还涉及到另外两个参数,一个是cpu_count,这个参数其实是物理cpu的个数,另外一个参数parallel_threads_per_cpu是依赖于操作系统的值,通常这个值是2.该参数值描述一个cpu在执行并行操作的时候,可以使用的线程或者进程数量.举个例子:假设我的物理cpu是4个.并且我设置了pga_aggregate_target的值.则这个parallel_max_server的数量就应该是4*2*2*5=80,在开始创建数据库的时候会自动计算出来并进行设置.

参考文档:http://docs.oracle.com/cd/B19306_01/server.102/b14223/usingpe.htm#sthref2310

分享到: 更多

Post a Comment

Your email is never published nor shared. Required fields are marked *