Since Oracle 10g, the default method for collecting histograms is AUTO, which means that Oracle determines whether to collect a histogram based on data distribution and the workload of the column.
What is less well known (including to me until recently), is that histograms can be removed if there is no column workload to justify them.
- Export statistics. Reorganize table by recreation. Import statistics.
- On a test database, import cut down data set, but with full volume production statistics.
- Column usage is deleted.
In all cases, the histograms will be removed by the maintenance window statistics job if column usage has not built up again. This can lead to changes in the execution plan and performance problems. When this happens in a production system, it is often a transient problem until the column usage builds up and the histograms are usually reinstated when statistics are next collected.
In the third scenario you have the potential to refer back to any statistics history (see Mauro Pagano's blog Histograms Come Histograms Go), but that will not be the case where the original table has been dropped.
In the third scenario you have the potential to refer back to any statistics history (see Mauro Pagano's blog Histograms Come Histograms Go), but that will not be the case where the original table has been dropped.
Demonstration
I will create a table and populate it. The data in columns A, B and C is perfectly uniform, only column SKEW is skewed.drop table t purge
/
create table t
(a number
,b number
,c number
,skew number
)
/
insert into t
with x as (
select rownum n from dual connect by level <= 10
)
select a.n, b.n, c.n, a.n*b.n*c.n
from x a, x b, x c
/
Note that I have collected statistics with SKEWONLY. So statistics will be collected if the data distribution is skewed irrespective of the usage of the column.
REM collect stats and histograms on everything
begin
dbms_stats.gather_table_stats(ownname=>'SCOTT' ,tabname=>'T' ,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY'
);
end;
/
REM check for histogram
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from dba_tab_col_statistics
where owner = 'SCOTT'
and table_name = 'T'
/
Now I have histograms on all columns, even though the data in columns A, B, and C is not skewed at all but is perfectly uniform, This shows the value of the column usage feature in limiting the excessive collection of histograms.
Table Column Num Num Num Sample
Name Name Distinct Nulls Histogram Buckets Size
----- ------ -------- ----- --------- ------- -------
T A 10 0 FREQUENCY 10 1000
T B 10 0 FREQUENCY 10 1000
T C 10 0 FREQUENCY 10 1000
T SKEW 120 0 FREQUENCY 120 1000
At this point, there is no column usage. So when I collect statistics the histogram will all disappear.
REM check no column usage
select *
from sys.col_usage$ u
where obj# IN(select object_id
from dba_objects
WHERE owner = 'SCOTT'
AND object_name = 'T'
AND object_Type = 'TABLE')
/
REM collect the stats with default col usage
begin
dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T');
end;
/
REM check for histogram - should have gone with no col usage
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from dba_tab_col_statistics
where owner = 'SCOTT'
and table_name = 'T'
/
Table Column Num Num Num Sample
Name Name Distinct Nulls Histogram Buckets Size
----- ------ -------- ----- --------- ------- -------
T A 10 0 NONE 1 1000
T B 10 0 NONE 1 1000
T C 10 0 NONE 1 1000
T SKEW 120 0 NONE 1 1000
Sure enough, there are now no histograms on the columns, and there is just a single bucket on each column. So now I will generate some column usage, I will do some equality queries on the column SKEW, and I will flush the monitoring information to the data dictionary.
set serveroutput off
declare
z NUMBER;
begin
for a in 1..10 loop
for b in 1..10 loop
for c in 1..10 loop
EXECUTE IMMEDIATE 'SELECT '||'/*'||(a*b*c)||'*/'||' count(*) FROM t where skew = '||(a*b*c) INTO z;
end loop;
end loop;
end loop;
end;
/
exec dbms_stats.flush_database_monitoring_info;
select *
from sys.col_usage$ u
where obj# IN(select object_id
from dba_objects
WHERE owner = 'SCOTT'
AND object_name = 'T'
AND object_Type = 'TABLE')
/
OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
98167 4 1 0 0 0 0 0 04-NOV-15
The column usage report tells us that there has been an equality predicate used on the column SKEW.
select dbms_stats.report_col_usage('SCOTT','T') from dual
/
DBMS_STATS.REPORT_COL_USAGE('SCOTT','T')
--------------------------------------------------------------------------------
LEGEND:
.......
EQ : Used in single table EQuality predicate
RANGE : Used in single table RANGE predicate
LIKE : Used in single table LIKE predicate
NULL : Used in single table is (not) NULL predicate
EQ_JOIN : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER : Used in single table FILTER predicate
JOIN : Used in JOIN predicate
DBMS_STATS.REPORT_COL_USAGE('SCOTT','T')
--------------------------------------------------------------------------------
GROUP_BY : Used in GROUP BY expression
...............................................................................
###############################################################################
COLUMN USAGE REPORT FOR SCOTT.T
................................
1. SKEW : EQ
###############################################################################
Now, when I collect statistics with the default METHOD_OPT, I will get a histogram on SKEW,
begin
dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T');
end;
/
REM check for histogram - should have gone with no col usage
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from dba_tab_col_statistics
where owner = 'SCOTT'
and table_name = 'T'
/
Table Column Num Num Num Sample
Name Name Distinct Nulls Histogram Buckets Size
----- ------ -------- ----- --------- ------- -------
T A 10 0 NONE 1 1000
T B 10 0 NONE 1 1000
T C 10 0 NONE 1 1000
T SKEW 120 0 FREQUENCY 120 1000
Nest, I will delete the column usage and collect the statistics again
REM remove the column usage
exec dbms_stats.reset_col_usage('SCOTT','T');
begin
dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T');
end;
/
REM check for histogram - should have gone with no col usage
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from dba_tab_col_statistics
where owner = 'SCOTT'
and table_name = 'T'
/
And now the histogram has gone. QED.
Table Column Num Num Num Sample
Name Name Distinct Nulls Histogram Buckets Size
----- ------ -------- ----- --------- ------- -------
T A 10 0 NONE 1 1000
T B 10 0 NONE 1 1000
T C 10 0 NONE 1 1000
T SKEW 120 0 NONE 1 1000
Conclusion
As you add histograms to a table, the time taken to collect statistics and parse statements increases. I think having a mechanism to remove unwanted histograms is desirable. However, it has the potential to cause problems, so just be aware of it.
If you are sure that you need histograms on a particular column, then I would suggest creating table preferences for METHOD_OPT and managing those preferences along with the table DDL.
If you are sure that you need histograms on a particular column, then I would suggest creating table preferences for METHOD_OPT and managing those preferences along with the table DDL.