Saturday, November 14, 2015

Disappearing Histograms

In general, once you have a histogram on a column you keep it.  However, this is not because DBMS_STATS simply maintains it because it is there, but because it is still appropriate to have it based on the column workload usage.
Since Oracle 10g, the default method for collecting histograms is AUTO, which means that Oracle determines whether to collect a histogram based on data distribution and the workload of the column.
What is less well known (including to me until recently), is that histograms can be removed if there is no column workload to justify them.

This can be an issue when for some reason you have imported statistics on a freshly rebuilt table with no column usage.  I think the first two scenarios are the most likely:
  1. Export statistics.  Reorganize table by recreation.  Import statistics.  
  2. On a test database, import cut down data set, but with full volume production statistics.
  3. Column usage is deleted.
In all cases, the histograms will be removed by the maintenance window statistics job if column usage has not built up again.  This can lead to changes in the execution plan and performance problems.  When this happens in a production system, it is often a transient problem until the column usage builds up and the histograms are usually reinstated when statistics are next collected.
In the third scenario you have the potential to refer back to any statistics history (see Mauro Pagano's blog Histograms Come Histograms Go), but that will not be the case where the original table has been dropped.

Demonstration

I will create a table and populate it.  The data in columns A, B and C is perfectly uniform, only column SKEW is skewed.
drop table t purge
/
create table t
(a number
,b number
,c number
,skew number
)
/
insert into t 
with x as (
select rownum n from dual connect by level <= 10
)
select a.n, b.n, c.n, a.n*b.n*c.n 
from x a, x b, x c
/
Note that I have collected statistics with SKEWONLY. So statistics will be collected if the data distribution is skewed irrespective of the usage of the column.
REM collect stats and histograms on everything
begin
 dbms_stats.gather_table_stats(ownname=>'SCOTT' ,tabname=>'T' ,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY'
 );
end;
/
REM check for histogram
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from   dba_tab_col_statistics
where  owner = 'SCOTT'
and    table_name = 'T'
/
Now I have histograms on all columns, even though the data in columns A, B, and C is not skewed at all but is perfectly uniform,  This shows the value of the column usage feature in limiting the excessive collection of histograms.
Table Column      Num   Num               Num  Sample
Name  Name   Distinct Nulls Histogram Buckets    Size
----- ------ -------- ----- --------- ------- -------                                                 
T     A            10     0 FREQUENCY      10    1000
T     B            10     0 FREQUENCY      10    1000
T     C            10     0 FREQUENCY      10    1000
T     SKEW        120     0 FREQUENCY     120    1000
At this point, there is no column usage. So when I collect statistics the histogram will all disappear.
REM check no column usage
select *
from   sys.col_usage$ u
where  obj# IN(select object_id 
               from  dba_objects 
               WHERE owner = 'SCOTT'
               AND   object_name = 'T'
               AND   object_Type = 'TABLE')
/
REM collect the stats with default col usage
begin
 dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T');
end;
/
REM check for histogram - should have gone with no col usage
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from   dba_tab_col_statistics
where  owner = 'SCOTT'
and    table_name = 'T'
/

Table Column      Num   Num               Num  Sample
Name  Name   Distinct Nulls Histogram Buckets    Size
----- ------ -------- ----- --------- ------- -------
T     A            10     0 NONE            1    1000
T     B            10     0 NONE            1    1000
T     C            10     0 NONE            1    1000
T     SKEW        120     0 NONE            1    1000
Sure enough, there are now no histograms on the columns, and there is just a single bucket on each column. So now I will generate some column usage, I will do some equality queries on the column SKEW, and I will flush the monitoring information to the data dictionary.
set serveroutput off
declare
 z NUMBER;
begin
 for a in 1..10 loop
  for b in 1..10 loop
   for c in 1..10 loop
    EXECUTE IMMEDIATE 'SELECT '||'/*'||(a*b*c)||'*/'||' count(*) FROM t where skew = '||(a*b*c) INTO z;
   end loop;
  end loop;
 end loop;
end;
/
exec dbms_stats.flush_database_monitoring_info;
select *
from sys.col_usage$ u
where obj# IN(select object_id 
 from  dba_objects 
 WHERE owner = 'SCOTT'
  AND   object_name = 'T'
 AND   object_Type = 'TABLE')
/
      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
     98167          4              1              0                 0           0          0          0 04-NOV-15
The column usage report tells us that there has been an equality predicate used on the column SKEW.
select dbms_stats.report_col_usage('SCOTT','T') from dual
/
DBMS_STATS.REPORT_COL_USAGE('SCOTT','T')
--------------------------------------------------------------------------------
LEGEND:                                                                                                                                 
.......                                                                                               
                                                
EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate   
LIKE       : Used in single table LIKE predicate    
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate             
NONEQ_JOIN : Used in NON EQuality JOIN predicate         
FILTER     : Used in single table FILTER predicate       
JOIN       : Used in JOIN predicate                      

DBMS_STATS.REPORT_COL_USAGE('SCOTT','T')                 
-------------------------------------------------------------------------------- 
GROUP_BY   : Used in GROUP BY expression
...............................................................................
###############################################################################
                       COLUMN USAGE REPORT FOR SCOTT.T                                                                       
................................
1. SKEW                                : EQ                                                           
###############################################################################
Now, when I collect statistics with the default METHOD_OPT, I will get a histogram on SKEW,
begin
 dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T');
end;
/
REM check for histogram - should have gone with no col usage
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from dba_tab_col_statistics
where owner = 'SCOTT'
and  table_name = 'T'
/

Table Column      Num   Num               Num  Sample
Name  Name   Distinct Nulls Histogram Buckets    Size
----- ------ -------- ----- --------- ------- -------
T     A            10     0 NONE            1    1000
T     B            10     0 NONE            1    1000
T     C            10     0 NONE            1    1000
T     SKEW        120     0 FREQUENCY     120    1000
Nest, I will delete the column usage and collect the statistics again
REM remove the column usage
exec dbms_stats.reset_col_usage('SCOTT','T');
begin
 dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T');
end;
/
REM check for histogram - should have gone with no col usage
select table_name, column_name, num_distinct, num_nulls, histogram, num_buckets, sample_size
from dba_tab_col_statistics
where owner = 'SCOTT'
and  table_name = 'T'
/
And now the histogram has gone. QED.

Table Column      Num   Num               Num  Sample
Name  Name   Distinct Nulls Histogram Buckets    Size
----- ------ -------- ----- --------- ------- -------
T     A            10     0 NONE            1    1000
T     B            10     0 NONE            1    1000
T     C            10     0 NONE            1    1000
T     SKEW        120     0 NONE            1    1000

Conclusion

As you add histograms to a table, the time taken to collect statistics and parse statements increases.  I think having a mechanism to remove unwanted histograms is desirable.  However, it has the potential to cause problems, so just be aware of it.
If you are sure that you need histograms on a particular column, then I would suggest creating table preferences for METHOD_OPT and managing those preferences along with the table DDL.

No comments :