在Oracle11g OCM考试中,我们接触了扩展统计信息(Extended Statistics),该功能主要是告诉优化器列和列之间存在着关系或表达式。
我们先来举个例子来说明这个问题。创建一个四列的表。
create table test1 as select mod(n, 100) a, mod(n, 100) b , mod(n, 50) c , mod(n, 20) d from (select level n from dual connect by level <= 10001);
测试表中列1和列2的关系是,他们是相等的。这里我们先收集统计信息和所有裂伤的直方图信息。
exec dbms_stats.gather_Table_stats('AAA', 'TEST1', estimate_percent => null, method_opt => 'for all columns size 254'); SQL> select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,HISTOGRAM from dba_tab_col_statistics where TABLE_NAME='TEST1' and owner='AAA'; TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM ------------------------------ ------------------------------ ------------ --------------- TEST1 A 100 FREQUENCY TEST1 B 100 FREQUENCY TEST1 C 50 FREQUENCY TEST1 D 20 FREQUENCY
我们来查询一个a=10的情况,正好这里rows=100,优化器能够准确的估算出基数。
explain plan for select count(*) from test1 where a=10;
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3896847026
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| TEST1 | 100 | 300 | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2- filter("A"=10)
14 rows selected.
如果我们在增加一个谓词条件b=10。由于a和b是相等的,添加n2=10的条件,不应该改变行源的基数,但是我们在实际看到的结果却是1.
SQL> explain plan for select count(*) from test1 where a=10 and b=10;
Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3896847026
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| TEST1 | 1 | 6 | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2- filter("A"=10AND"B"=10)
14 rows selected.
我们可以得知优化器CBO估算的一个公式是:
rows = count * (1/NDV for n1) * (1/NDV for n2)
= 10000 * (1/100) * (1/100) =1 row.
优化器认为selectivity是1/NDV。而没有考虑直方图。但是n1 = 10和n2 = 10这种情况会有100行。在一些复杂的SQL里面,这种不正确的基数会导致访问效率低下。那接下来我们创建一个扩展的统计信息来让CBO识别这些列值之间的相关性。
SQL> select dbms_stats.create_extended_stats(ownname =>'AAA', tabname =>'TEST1',extension =>'(a,b)') as n1_n2_extened from dual; N1_N2_EXTENED --------------------------------------------------------------------------------------------------------------------------------------------- SYS_STUNA$6DVXJXTP05EH56DTIR0X SQL> select extension_name, extension from dba_stat_extensions where table_name = 'TEST1' and owner='AAA'; EXTENSION_NAME EXTENSION ------------------------------ ------------------------------ SYS_STUNA$6DVXJXTP05EH56DTIR0X ("A","B") 再次收集直方图信息。 exec dbms_stats.gather_Table_stats('AAA', 'TEST1', estimate_percent => null, method_opt => 'for all columns size 254'); SQL> select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,HISTOGRAM from dba_tab_col_statistics where TABLE_NAME='TEST1' and owner='AAA'; TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM ------------------------------ ------------------------------ ------------ --------------- TEST1 A 100 FREQUENCY TEST1 B 100 FREQUENCY TEST1 C 50 FREQUENCY TEST1 D 20 FREQUENCY TEST1 SYS_STUNA$6DVXJXTP05EH56DTIR0X 100 FREQUENCY SQL> explain plan for select count(*) from test1 where a=10 and b=10; Explained. SQL> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 3896847026 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| TEST1 | 100 | 600 | 9 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2- filter("A"=10AND"B"=10) 14 rows selected.
可以看到这一次就非常的准,因为优化器理解了A列和B列之间的关系。
Post a Comment