oracle11g-Extended statistics

在Oracle11g OCM考试中,我们接触了扩展统计信息(Extended Statistics),该功能主要是告诉优化器列和列之间存在着关系或表达式。
我们先来举个例子来说明这个问题。创建一个四列的表。

create table test1 as select mod(n, 100) a, mod(n, 100) b ,
mod(n, 50) c , mod(n, 20) d
from (select level n from dual connect by level <= 10001);

测试表中列1和列2的关系是,他们是相等的。这里我们先收集统计信息和所有裂伤的直方图信息。

exec dbms_stats.gather_Table_stats('AAA', 'TEST1', estimate_percent => null, method_opt => 'for all columns size 254');

SQL> select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,HISTOGRAM from dba_tab_col_statistics where TABLE_NAME='TEST1' and owner='AAA';

TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------
TEST1 A 100 FREQUENCY
TEST1 B 100 FREQUENCY
TEST1 C 50 FREQUENCY
TEST1 D 20 FREQUENCY

我们来查询一个a=10的情况,正好这里rows=100,优化器能够准确的估算出基数。

explain plan for select count(*) from test1 where a=10;

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3896847026

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| TEST1 | 100 | 300 | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2- filter("A"=10)

14 rows selected.

如果我们在增加一个谓词条件b=10。由于a和b是相等的,添加n2=10的条件,不应该改变行源的基数,但是我们在实际看到的结果却是1.

SQL> explain plan for select count(*) from test1 where a=10 and b=10;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3896847026

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| TEST1 | 1 | 6 | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2- filter("A"=10AND"B"=10)

14 rows selected.

我们可以得知优化器CBO估算的一个公式是:
rows = count * (1/NDV for n1) * (1/NDV for n2)
= 10000 * (1/100) * (1/100) =1 row.
优化器认为selectivity是1/NDV。而没有考虑直方图。但是n1 = 10和n2 = 10这种情况会有100行。在一些复杂的SQL里面,这种不正确的基数会导致访问效率低下。那接下来我们创建一个扩展的统计信息来让CBO识别这些列值之间的相关性。

SQL> select dbms_stats.create_extended_stats(ownname =>'AAA', tabname =>'TEST1',extension =>'(a,b)') as n1_n2_extened from dual;

N1_N2_EXTENED
---------------------------------------------------------------------------------------------------------------------------------------------
SYS_STUNA$6DVXJXTP05EH56DTIR0X

SQL> select extension_name, extension from dba_stat_extensions where table_name = 'TEST1' and owner='AAA';

EXTENSION_NAME EXTENSION
------------------------------ ------------------------------
SYS_STUNA$6DVXJXTP05EH56DTIR0X ("A","B")

再次收集直方图信息。
exec dbms_stats.gather_Table_stats('AAA', 'TEST1', estimate_percent => null, method_opt => 'for all columns size 254');

SQL> select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,HISTOGRAM from dba_tab_col_statistics where TABLE_NAME='TEST1' and owner='AAA';

TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------
TEST1 A 100 FREQUENCY
TEST1 B 100 FREQUENCY
TEST1 C 50 FREQUENCY
TEST1 D 20 FREQUENCY
TEST1 SYS_STUNA$6DVXJXTP05EH56DTIR0X 100 FREQUENCY

SQL> explain plan for select count(*) from test1 where a=10 and b=10;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3896847026

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| TEST1 | 100 | 600 | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2- filter("A"=10AND"B"=10)

14 rows selected.

可以看到这一次就非常的准,因为优化器理解了A列和B列之间的关系。

分享到: 更多

Post a Comment

Your email is never published nor shared. Required fields are marked *