Wednesday, January 15, 2020

On-Line Statistics Gathering Disabled by Column Specific METHOD_OPT Table Statistics Preference

I have come across a quirk where the presence of a table statistics preference that specifies METHOD_OPT that is specific to some columns disables on-line statistics gathering.  This behaviour is at least not documented.  I have reproduced this in Oracle version 12.1.0.2 and 19.3.

Demonstration 

I will create two identical tables, but on the first table, I will specify a table statistic preference to collect a histogram on column C.
set serveroutput on verify on autotrace off
CREATE TABLE t1(a number, b varchar2(1000), c number);
CREATE TABLE t2(a number, b varchar2(1000), c number);
exec dbms_stats.set_table_prefs(user,'t1','METHOD_OPT','FOR ALL COLUMNS SIZE AUTO FOR COLUMNS SIZE 254 C');
To make the test rerunnable, I will truncate each table, delete any statistics (because truncate does not delete statistics) and then populate the table again in direct-path mode.
TRUNCATE TABLE t1;
EXEC dbms_stats.delete_table_stats(user,'T1');
INSERT /*+APPEND*/ INTO t1 
SELECT ROWNUM a, TO_CHAR(TO_DATE(rownum,'J'),'Jsp') b, CEIL(SQRT(rownum)) c 
FROM dual CONNECT BY level <= 1e5;

TRUNCATE TABLE t2;
EXEC dbms_stats.delete_table_stats(user,'T2');
INSERT /*+APPEND*/ INTO t2
SELECT ROWNUM a, TO_CHAR(TO_DATE(rownum,'J'),'Jsp') b, CEIL(SQRT(rownum)) c 
FROM dual CONNECT BY level <= 1e5;
COMMIT;
I expect to get statistics on both tables.
alter session set nls_date_Format = 'hh24:mi:ss dd/mm/yy';
column table_name format a10
column column_name format a11
SELECT table_name, num_rows, last_analyzed FROM user_tables WHERE table_name LIKE 'T_' ORDER BY 1;
SELECT table_name, column_name, num_distinct, histogram, num_buckets FROM user_tab_columns WHERE table_name LIKE 'T_' ORDER BY 1,2;
But I only get table and column statistics on T2, the one without the statistics preference.
TABLE_NAME   NUM_ROWS LAST_ANALYZED
---------- ---------- -----------------
T1
T2             100000 10:08:30 15/01/20

Table Column
Name  Name   NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------ ------------ --------------- -----------
T1    A                   NONE
T1    B                   NONE
T1    C                   NONE
T2    A            100000 NONE                      1
T2    B             98928 NONE                      1
T2    C               317 NONE                      1
It appears that I don't get statistics on T1 because I have specified a METHOD_OPT table statistic preference that is specific to some named columns. It doesn't have to specify creating a histogram, it might be preventing a histogram from being created.
For example, this preference does not disable on-line statistics collection.
EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR ALL COLUMNS SIZE 1');
But these preferences do disable on-line statistics collection.
EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR COLUMNS SIZE 1 B C');
EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR COLUMNS SIZE 1 A B C');
I have not found any other statistics preferences (for other DBMS_STATS parameters) that cause this behaviour.

Conclusion 

Table preferences are recommended as a method of controlling statistics collection declaratively and consistently. You don't have to specify parameters to DBMS_STATS into scripts that collect statistics ad-hoc. The table statistics preferences provide a method that every time statistics are collected on a particular table, they are collected consistently, albeit in a way that may be different from the default.
However, take the example of an ETL process loading data into a data warehouse. If you rely on on-line statistics gathering to collect table statistics as a part of a data load process, you must now be careful not to disable statistics collection during the load with METHOD_OPT statistics preferences.

Autonomous & 19c Addendum

(Updated 21st January 2020) In the Oracle Autonomous Data Warehouse, in order to make statistics collection as self-managing as possible, Oracle sets two undocumented parameters.
  • _optimizer_gather_stats_on_load_hist=TRUE - so that histograms are created on every column when on-line statistics are gathered
  • _optimizer_gather_stats_on_load_all=TRUE - so that on-line statistics are collected for every direct-path insert, not just the first one.
Creating a column specific METHOD_OPT statistics preference disables this behaviour.
From 19c, on Engineered systems, Oracle collects Real-Time statistics.  The METHOD_OPT preference does not disable this behaviour, but you only get Real-Time statistics if you have conventional statistics in the first place.  If you don't get statistics after a bulk load, then you won't get Real-Time statistics until conventional statistics have been explicitly gathered, for example, by the maintenance job.

(Update 22nd January 2020) This has been filed as bug 30787109.

1 comment :

ChandlerDBA said...

Oh! Interesting quirk!

Of course, a data load only gives you partial stats (e.g. no histograms) with the online gather anyway (you're supposed to perform a gather_table_stats with option=>'gather auto' following the load) so it seems strange to disable all of the stats given method_opt restrictions wouldn't be used anyway.