- A simple demonstration to show the behaviour of extended statistics and how it can be disabled by the presence of histograms. None of this is new, there are many other blog posts on this topic. I provide links to some of them.
- I have added an enhancement to the EDB360 utility to detect histograms on columns in extended statistics.
Introduction
'Extended statistics were introduced in Oracle 11g to allow statistics to be gathered on groups of columns, to highlight the relationship between them, or on expressions. Oracle 11gR2 makes the process of gathering extended statistics for column groups easier'. [Tim Hall: https://oracle-base.com/articles/11g/extended-statistics-enhancements-11gr2]- Extended Statistics - Multi-Column (Column Group) Statistics.
- Automatic Column Group Detection in Oracle Database 12c Release 1 (12.1).
Example 1: Cardinality from Extended Statistics
Without extended statistics, Oracle will simply multiply column cardinalities together. Here is a simple example. I will create a table with 10000 rows, where two columns each have the same 100 rows of 100 values, so they correlate perfectly. I will gather statistics, but no histograms.create table t
(k number
,a number
,b number
,x varchar2(1000)
);
insert into t
with n as (select rownum n from dual connect by level <= 100)
select rownum, a.n, a.n, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
from n a, n b
order by a.n, b.n;
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1');
I will deliberately disable optimizer feedback so that Oracle cannot learn from experience about the cardinality misestimates.alter session set statistics_level=ALL;
alter session set "_optimizer_use_feedback"=FALSE;
select count(*) from t where a = 42 and b=42;
COUNT(*)
----------
100
Oracle estimates that it will get 1 row but actually gets 100.It estimates 1 because it is 1/100 * 1/100 * 10000 rows
select * from table(dbms_xplan.display_cursor(null,null,format=>'ADVANCED +ALLSTATS LAST, IOSTATS -PROJECTION -OUTLINE'));
Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 26 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1 | 26 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
Now I will create extended statistics on the column group. I can do that in one of two ways:- either by explicitly creating the definition and then creating them by gathering statistics:
SELECT dbms_stats.create_extended_stats(ownname=>user, tabname=>'t', extension=>'(a,b)')
FROM dual;
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1');
- Or, I can create extended statistics directly in one go by specifying them in the method opt clause.
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1, FOR COLUMNS SIZE 1 (A,B)');
Now Oracle correctly estimates that the same query will fetch 100 rows because it directly knows the cardinality for the two columns in the query.Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
Example 2: Cardinality from Index Statistics
I can get exactly the same effect by creating an index on the two columns.drop table t purge;
create table t
(k number
,a number
,b number
,x varchar2(1000)
);
insert into t
with n as (select rownum n from dual connect by level <= 100)
select rownum, a.n, a.n, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
from n a, n b
order by a.n, b.n;
create index t_ab on t(a,b) compress;
This time I have not collected any statistics on the table. Statistics are automatically collected on the index when it is built. I have used a hint to stop the query using the index to look up the rows, nonetheless, Oracle has correctly estimated that it will get 100 rows because it has used the number of distinct keys from the index statistics.SQL_ID 711banpfgfa18, child number 0
-------------------------------------
select /*+FULL(t)*/ count(*) from t where a = 42 and b=42
Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 74 |
| 1 | SORT AGGREGATE | | 1 | 1 | 26 | | | 1 |00:00:00.01 | 74 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 2600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 74 |
---------------------------------------------------------------------------------------------------------------------
Note, that there is nothing in the execution plan to indicate that the index statistics were used to estimate the number of rows returned!Example 3: Histograms Disable the Use of Extended Statistics
There have long been blogs that refer to behaviour that Oracle has documented as Bug 6972291: Column group selectivity is not used when there is a histogram on one column:'As of 10.2.0.4 CBO can use the selectivity of column groups but this option is disabled if there is a histogram defined on any of the columns of the column group.
Note: This fix is disabled by default. To enable the fix set "_fix_control"="6972291:ON"
When ENABLED the code will use multi-column stats regardless of whether there is a histogram on one of the columns or not. When DISABLED (default) CBO will not use multi-column stats if there is a histogram on one of the columns in the column group.'
- Christian Antognini, 2014: https://antognini.ch/2014/02/extension-bypassed-because-of-missing-histogram/
- Jonathan Lewis, 2012: https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/
- Maria Colgan also commented: 'This … was a deliberate design decision to prevent over-estimations when one of the values supplied is ‘out of range’. We can’t ignore the ‘out of range’ scenario just because we have a column group. Extended statistics do not contain the min, max values for the column group so we rely on the individual column statistics to check for ‘out of range’ scenarios like yours. When one of the columns is ‘out of range’, we revert back to the column statistics because we know it is going to generate a lower selectivity range and if one of the columns is ‘out of range’ then the number of rows returned will be lower or none at all, as in your example'
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 100 A, FOR COLUMNS SIZE 1 B (A,B)');
My cardinality estimate goes back to 1 because Oracle does use the extended statistics in the presence of a histogram on any of the constituent columns. Exactly the same happens if the number of distinct values on the combination of columns comes from composite index statistics. A histogram similarly disables their use.SQL_ID 8trj2kacqhm6f, child number 1
-------------------------------------
select count(*) from t where a = 42 and b=42
Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1 | 6 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
This is likely to happen in real-life systems because histograms can be automatically created when statistics are gathered.Out-of-Range Predicates
If you have one or more predicates on columns that are part of an extended statistics, and that predicate goes out of range when compared to the column statistics, then Oracle still doesn’t use the extended statistics (see also https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/), irrespective of whether it has a histogram or not, or whether fix control 6972291 is set or not.The extended histogram uses a virtual column whose value is derived from SYS_OP_COMBINED_HASH(). You can see this in the default data value for the column. Therefore the optimizer cannot use the minimum/maximum value (see also https://jonathanlewis.wordpress.com/2018/08/02/extended-histograms-2/).
Instead, Oracle does the linear decay of the density of the column predicates, and if there is a frequency or top-frequency histogram then it uses half the density of the lowest frequency bucket and applies linear decay to that.
Example 4: Extended Histograms
This time I will create a histogram on my extended statistics as well as histograms on the underlying columns.exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 100 A B (A,B)');
I am back to getting the correct cardinality estimate.SQL_ID 8trj2kacqhm6f, child number 0
-------------------------------------
select count(*) from t where a = 42 and b=42
Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
And this is also something that has been blogged about previously:- Jonathan Lewis, 2018: https://jonathanlewis.wordpress.com/2018/07/31/extended-histograms/
Threats
In this blog, Jonathan Lewis comments (https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/) on certain weaknesses in the implementations. He also references other bloggers.Either creating or removing histograms on columns in either extended statistics or composite indexes may result in the execution plan changing because those extended statistics may change. This could happen automatically when gathering statistics as data skew and predicate usage changes.
If I drop a composite index, maybe because it is not used, or maybe it is redundant because it is a subset of another index, then I should replace it with an extended histogram on the same set of columns.
- Jonathan Lewis, 2015: 'It can be a very good idea to have column group stats on leading subsets of the index columns – especially if you add columns to an existing index definition.' - https://jonathanlewis.wordpress.com/2015/12/29/column-groups-2#comment-85443
- Jonathan Lewis, 2016: '… each time you drop[…] an index you really ought to be prepared to create a set of extended stats' - https://jonathanlewis.wordpress.com/2016/12/07/extended-stats-4/
- However, there is a limit of 20 column groups per table (or 1 per 10 columns, whichever be the greater, but also counting existing column groups as they are a type of virtual column). So, you can easily run out of extended statistics.
Detecting the Problem
I have added a report to section 3c of EDB360 to detect the problem. The SQL query is shown below. It will report on histograms on columns in either:- composite indexes where there are no extended column group statistics, or
- extended column group statistics for which there are no histograms.
3c.25. Columns with Histograms in Extended Statistics (DBA_STAT_EXTENSIONS)
#
|
Table
Owner |
Table
Name |
Object
Type |
Index/Extension Name
|
Number of
Distinct Values |
Number of
Buckets |
EXTENSION
|
Column
Name |
Column
Number of Distinct Values |
Column
Number of Buckets |
Column
Histogram Type |
1
|
HR
|
JOB_HISTORY
|
Index
|
JHIST_EMP_ID_ST_DATE_PK
|
10
|
("EMPLOYEE_ID","START_DATE")
|
EMPLOYEE_ID
|
7
|
7
|
FREQUENCY
|
|
2
|
OE
|
INVENTORIES
|
Index
|
INVENTORY_IX
|
1112
|
("WAREHOUSE_ID","PRODUCT_ID")
|
PRODUCT_ID
|
208
|
208
|
FREQUENCY
|
|
3
|
OE
|
ORDER_ITEMS
|
Index
|
ORDER_ITEMS_PK
|
665
|
("ORDER_ID","LINE_ITEM_ID")
|
ORDER_ID
|
105
|
105
|
FREQUENCY
|
|
4
|
OE
|
ORDER_ITEMS
|
Index
|
ORDER_ITEMS_UK
|
665
|
("ORDER_ID","PRODUCT_ID")
|
ORDER_ID
|
105
|
105
|
FREQUENCY
|
|
5
|
OE
|
ORDER_ITEMS
|
Index
|
ORDER_ITEMS_UK
|
665
|
("ORDER_ID","PRODUCT_ID")
|
PRODUCT_ID
|
185
|
185
|
FREQUENCY
|
|
6
|
SCOTT
|
T
|
Extension
|
SYS_STUNA$6DVXJXTP05EH56DTIR0X
|
100
|
1
|
("A","B")
|
A
|
100
|
100
|
FREQUENCY
|
7
|
SCOTT
|
T
|
Extension
|
SYS_STUNA$6DVXJXTP05EH56DTIR0X
|
100
|
1
|
("A","B")
|
B
|
100
|
100
|
FREQUENCY
|
8
|
SOE
|
ORDERS
|
Index
|
ORD_WAREHOUSE_IX
|
10270
|
("WAREHOUSE_ID","ORDER_STATUS")
|
ORDER_STATUS
|
10
|
10
|
FREQUENCY
|
|
9
|
SOE
|
ORDER_ITEMS
|
Index
|
ORDER_ITEMS_PK
|
13758515
|
("ORDER_ID","LINE_ITEM_ID")
|
LINE_ITEM_ID
|
7
|
7
|
FREQUENCY
|
Just because something is reported by this test, does not necessarily mean that you need to change anything.
- Providing fix control 6972291 is not enabled, should you wish to drop or alter any reported index, you at least know that it cannot be used to provide column group statistics. Though you would still need to consider SQL that might use the index directly.
- You might choose to add column group histograms, and sometimes that will involve adding column statistics. However, the number of distinct values on the column group will usually be higher than on the individual columns and can easily be greater than the number of buckets you can have in a frequency histogram. In such cases, from 12c, you may end up with either a Top-frequency histogram or a hybrid histogram.
- Or you might choose to remove the histograms from the individual columns so that the column group statistics are used.
- Or you might choose to enforce the status quo, by setting table statistics preferences to ensure currently existing histograms are preserved and currently, non-existent histograms are not introduced.
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 A B (A,B)');
- Or, you might even enable the fix_control. You can also do that at session level or even statement level (but beware of disabling any other fix controls that may be set).
SQL_ID 16judk2v0uf7w, child number 0
-------------------------------------
select /*+FULL(t) OPT_PARAM('_fix_control','6972291:on')*/ count(*)
from t where a = 42 and b=42
Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
OPT_PARAM('_fix_control' '6972291:1')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "T"@"SEL$1")
END_OUTLINE_DATA
*/
EDB360 Test Query
This is the SQL query that produces the report in EDB360.WITH i as ( /*composite indexes*/
SELECT i.table_owner, i.table_name, i.owner index_owner, i.index_name, i.distinct_keys
, '('||(LISTAGG('"'||c.column_name||'"',',') WITHIN GROUP (order by c.column_position))||')' column_list
FROM dba_indexes i
, dba_ind_columns c
WHERE i.table_owner = c.table_owner
AND i.table_name = c.table_name
AND i.owner = c.index_owner
AND i.index_name = c.index_name
AND i.table_name NOT LIKE 'BIN$%'
AND i.table_owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_040200','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
AND i.table_owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF','MGDSYS','OJVMSYS')
GROUP BY i.table_owner, i.table_name, i.owner, i.index_name, i.distinct_keys
HAVING COUNT(*) > 1 /*index with more than one column*/
), e as ( /*extended stats*/
SELECT e.owner, e.table_name, e.extension_name
, CAST(e.extension AS VARCHAR(1000)) extension
, se.histogram, se.num_buckets, se.num_distinct
FROM dba_stat_extensions e
, dba_tab_col_statistics se
WHERE e.creator = 'USER'
AND se.owner = e.owner
AND se.table_name = e.table_name
AND se.column_name = e.extension_name
AND e.table_name NOT LIKE 'BIN$%'
AND e.owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_040200','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
AND e.owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF','MGDSYS','OJVMSYS')
)
SELECT e.owner, e.table_name
, 'Extension' object_type
, e.extension_name object_name, e.num_distinct, e.num_buckets, e.extension
, sc.column_name
, sc.num_distinct col_num_distinct
, sc.num_buckets col_num_buckets
, sc.histogram col_histogram
FROM e
, dba_tab_col_statistics sc
WHERE e.histogram = 'NONE'
AND e.extension LIKE '%"'||sc.column_name||'"%'
AND sc.owner = e.owner
AND sc.table_name = e.table_name
AND sc.histogram != 'NONE'
AND sc.num_buckets > 1 /*histogram on column*/
AND e.num_buckets = 1 /*no histogram on extended stats*/
UNION ALL
SELECT /*+ NO_MERGE */ /* 3c.25 */
i.table_owner, i.table_name
, 'Index' object_type
, i.index_name object_name, i.distinct_keys, TO_NUMBER(null), i.column_list
, sc.column_name
, sc.num_distinct col_num_distinct
, sc.num_buckets col_num_buckets
, sc.histogram col_histogram
From i
, dba_ind_columns ic
, dba_tab_col_statistics sc
WHERE ic.table_owner = i.table_owner
AND ic.table_name = i.table_name
AND ic.index_owner = i.index_owner
AND ic.index_name = i.index_name
AND sc.owner = i.table_owner
AND sc.table_name = ic.table_name
AND sc.column_name = ic.column_name
AND sc.histogram != 'NONE'
AND sc.num_buckets > 1 /*histogram on column*/
AND NOT EXISTS( /*report index if no extension*/
SELECT 'x'
FROM e
WHERE e.owner = i.table_owner
AND e.table_name = i.table_name
AND e.extension = i.column_list)
ORDER BY 1,2,3,4;