Friday, November 20, 2020

Oracle 12.2: New Statistic Preference PREFERENCE_OVERRIDES_PARAMETER

Introduction

There is a new statistic preference, PREFERENCE_OVERRIDES_PARAMETER available from Oracle 12.2.  It allows the DBA to override any parameters specified when gathering statistics in favour of any statistics preferences that are defined.  This new preference can be specified at database-level or at table-level, or both.

From the introduction of the cost-based optimizer in Oracle 7, we all had to write scripts to collect statistics.  The introduction of the maintenance window in Oracle 10g was supposed to supersede that with a regularly scheduled maintenance window.  It still is not uncommon to find systems that rely on custom scripts that collect object statistics.  Sometimes, commands to collect statistics are embedded in applications.  

It remains perfectly reasonable to choose to collect statistics on certain objects at the exact times when it is most appropriate.  For example: just after a table has been populated, or perhaps refresh the statistics on a very large table at a quiet time.  

Since 11g, Oracle has provided global and table statistics preferences to specify how statistics are to be collected.  This declarative method is generally recommended instead of specifying parameter on calls to DBMS_STATS.  The main advantage is consistency. When statistics are collected on a table, they will always be collected in the same way, including during the maintenance window job.  However, if scripts and programs still specify parameters when they call DBMS_STATS, those parameters will override the preferences.

Two scenarios in which enabling PREFERENCE_OVERRIDES_PARAMETER would be advantageous immediately come to mind.

  • The hash-based algorithm, introduced in 11g, to calculate the number of distinct values on a column only applies if the ESTIMATE_PERCENT parameter is AUTO_SAMPLE_SIZE, which is the default (see How does AUTO_SAMPLE_SIZE work in Oracle Database 12c? by Nigel Bayliss).  This new algorithm produces more accurate answers than even a large sample size, and much more quickly because there is no need to sort the sampled data for each column.  Therefore, ESTIMATE_PERCENT should no longer be specified.
  • I have found applications that specify a METHOD_OPT parameter to collect histograms that do not make sense.  For example, PeopleSoft used to use FOR ALL INDEXED COLUMNS SIZE 1 in the code that collected statistics.  That would collect column minimum and maximum column values only on index columns, but the column statistics on any unindexed columns simply would not be updated.   If scripts collect histograms that should be retained, then those METHOD_OPTs should be defined as a table preference.

See also Overriding DBMS_STATS Parameter Settings by Maria Colgan

Demonstration

The default value of PREFERENCE_OVERRIDES_PARAMETER is false.  Therefore, the default is the status quo, parameters override preferences.

exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','FALSE');

I am going to create two tables with 50,000 rows each. 

DROP TABLE t1 PURGE;
DROP TABLE t2 PURGE;
CREATE TABLE t1 AS SELECT * FROM all_objects WHERE rownum <= 50000;
CREATE UNIQUE INDEX t1_idx ON t1 (owner, object_type, object_name, subobject_name);
CREATE TABLE t2 AS SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * FROM all_objects WHERE rownum <= 50000;
CREATE UNIQUE INDEX t2_idx ON t2 (owner, object_type, object_name, subobject_name);
@tstats
REM tstats.sql
set pages 99 lines 200 trimspool on autotrace off
column table_name format a10
column column_name format a30
column PREFERENCE_OVERRIDES_PARAMETER format a30
break on report
alter session set nls_date_Format = 'hh24:mi:ss dd/mm/yyyy';
spool tstats
SELECT DBMS_STATS.GET_PREFS('PREFERENCE_OVERRIDES_PARAMETER') AS PREFERENCE_OVERRIDES_PARAMETER
FROM   dual;
SELECT table_name, sample_size, num_rows, last_analyzed
FROM user_tables
WHERE table_name in('T1','T2')
ORDER BY 1;

break on table_name skip 1
SELECT table_name, column_name, num_distinct, num_Buckets, histogram, last_analyzed
FROM user_tab_columns
WHERE table_name in('T1','T2')
order by 1,2;
spool off

Real-time statistics were collected on T1.  Note that I have column statistics on T1, but no histograms.  I suppressed statistics collection on T2. 

If I had run this test on an autonomous database then I would have had histograms because _optimizer_gather_stats_on_load_hist=TRUE

PREFERENCE_OVERRIDES_PARAMETER
------------------------------
FALSE

TABLE_NAME SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1               50000      50000 15:33:06 13/11/2020
T2

TABLE_NAME COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1         APPLICATION                               1           1 NONE            15:33:06 13/11/2020
           CREATED                                 520           1 NONE            15:33:06 13/11/2020
           CREATED_APPID                             0           0 NONE            15:33:06 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:33:06 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:33:06 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:33:06 13/11/2020
           DUPLICATED                                1           1 NONE            15:33:06 13/11/2020
           EDITIONABLE                               1           1 NONE            15:33:06 13/11/2020
           EDITION_NAME                              0           0 NONE            15:33:06 13/11/2020
           GENERATED                                 2           1 NONE            15:33:06 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:33:06 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:33:06 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:33:06 13/11/2020
           NAMESPACE                                 7           1 NONE            15:33:06 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:33:06 13/11/2020
           OBJECT_NAME                           45268           1 NONE            15:33:06 13/11/2020
           OBJECT_TYPE                              24           1 NONE            15:33:06 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:33:06 13/11/2020
           OWNER                                     8           1 NONE            15:33:06 13/11/2020
           SECONDARY                                 1           1 NONE            15:33:06 13/11/2020
           SHARDED                                   1           1 NONE            15:33:06 13/11/2020
           SHARING                                   4           1 NONE            15:33:06 13/11/2020
           STATUS                                    1           1 NONE            15:33:06 13/11/2020
           SUBOBJECT_NAME                           77           1 NONE            15:33:06 13/11/2020
           TEMPORARY                                 2           1 NONE            15:33:06 13/11/2020
           TIMESTAMP                               602           1 NONE            15:33:06 13/11/2020

T2         APPLICATION                                             NONE
           CREATED                                                 NONE
           CREATED_APPID                                           NONE
           CREATED_VSNID                                           NONE
           DATA_OBJECT_ID                                          NONE
           DEFAULT_COLLATION                                       NONE
           DUPLICATED                                              NONE
           EDITIONABLE                                             NONE
           EDITION_NAME                                            NONE
           GENERATED                                               NONE
           LAST_DDL_TIME                                           NONE
           MODIFIED_APPID                                          NONE
           MODIFIED_VSNID                                          NONE
           NAMESPACE                                               NONE
           OBJECT_ID                                               NONE
           OBJECT_NAME                                             NONE
           OBJECT_TYPE                                             NONE
           ORACLE_MAINTAINED                                       NONE
           OWNER                                                   NONE
           SECONDARY                                               NONE
           SHARDED                                                 NONE
           SHARING                                                 NONE
           STATUS                                                  NONE
           SUBOBJECT_NAME                                          NONE
           TEMPORARY                                               NONE
           TIMESTAMP                                               NONE

I will now gather statistics on both tables with an explicit sample size and a METHOD_OPT that does not collect statistics on unindexed columns.

EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
@tstats
  • I can see that I got sample sizes close to 5000 for each table, and the number of rows is 10 times larger, so it was a 10% sample size.
  • There is a unique index on each table on 4 columns.  Only the column statistics for those 4 columns were updated.  On T1, we can see the column statistics were collected at different times.
TABLE_NAME SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1                4937      49370 15:33:46 13/11/2020
T2                5013      50130 15:33:47 13/11/2020


TABLE_NAME COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1         APPLICATION                               1           1 NONE            15:33:06 13/11/2020
           CREATED                                 520           1 NONE            15:33:06 13/11/2020
           CREATED_APPID                             0           0 NONE            15:33:06 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:33:06 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:33:06 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:33:06 13/11/2020
           DUPLICATED                                1           1 NONE            15:33:06 13/11/2020
           EDITIONABLE                               1           1 NONE            15:33:06 13/11/2020
           EDITION_NAME                              0           0 NONE            15:33:06 13/11/2020
           GENERATED                                 2           1 NONE            15:33:06 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:33:06 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:33:06 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:33:06 13/11/2020
           NAMESPACE                                 7           1 NONE            15:33:06 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:33:06 13/11/2020
           OBJECT_NAME                           40572           1 NONE            15:33:46 13/11/2020
           OBJECT_TYPE                              18          10 HEIGHT BALANCED 15:33:46 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:33:06 13/11/2020
           OWNER                                     7           7 FREQUENCY       15:33:46 13/11/2020
           SECONDARY                                 1           1 NONE            15:33:06 13/11/2020
           SHARDED                                   1           1 NONE            15:33:06 13/11/2020
           SHARING                                   4           1 NONE            15:33:06 13/11/2020
           STATUS                                    1           1 NONE            15:33:06 13/11/2020
           SUBOBJECT_NAME                            8           1 NONE            15:33:46 13/11/2020
           TEMPORARY                                 2           1 NONE            15:33:06 13/11/2020
           TIMESTAMP                               602           1 NONE            15:33:06 13/11/2020

T2         APPLICATION                                             NONE
           CREATED                                                 NONE
           CREATED_APPID                                           NONE
           CREATED_VSNID                                           NONE
           DATA_OBJECT_ID                                          NONE
           DEFAULT_COLLATION                                       NONE
           DUPLICATED                                              NONE
           EDITIONABLE                                             NONE
           EDITION_NAME                                            NONE
           GENERATED                                               NONE
           LAST_DDL_TIME                                           NONE
           MODIFIED_APPID                                          NONE
           MODIFIED_VSNID                                          NONE
           NAMESPACE                                               NONE
           OBJECT_ID                                               NONE
           OBJECT_NAME                           43403           1 NONE            15:33:47 13/11/2020
           OBJECT_TYPE                              14          10 HEIGHT BALANCED 15:33:47 13/11/2020
           ORACLE_MAINTAINED                                       NONE
           OWNER                                     7           7 FREQUENCY       15:33:47 13/11/2020
           SECONDARY                                               NONE
           SHARDED                                                 NONE
           SHARING                                                 NONE
           STATUS                                                  NONE
           SUBOBJECT_NAME                           10           1 NONE            15:33:47 13/11/2020
           TEMPORARY                                               NONE
           TIMESTAMP                                               NONE

Now I am going to enable PREFERENCE_OVERRIDES_PARAMETER at database-level, but I am also going to disable it for T1, so the parameters still override the preference.  I would like histograms on the indexed columns on T2, so I am going to specify a METHOD_OPT  table preference.  If I mix FOR ALL COLUMNS  and FOR ALL INDEXED COLUMNS, whichever is specified first will be overridden by the second. So instead, I must explicitly list the indexed columns.

connect / as sysdba
exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');
connect scott/tiger
exec dbms_stats.set_table_prefs(user,'T1','PREFERENCE_OVERRIDES_PARAMETER','FALSE');
exec dbms_stats.set_table_prefs(user,'T2','METHOD_OPT'
              ,'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 10 owner, object_type, object_name, subobject_name');

EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
@tstats

  • For T2, the sample size and the number of rows are both 50000.  The new NDV algorithm has produced the correct answer.  I didn't get any histograms other than on the columns I specified because there isn't any table usage yet.
  • T1 used the 10% sample size and again only the statistics on the indexed columns were updated.
  • I didn't get a histogram on T1 on OBJECT_NAME and SUBOBJECT_NAME, and I got a height-balanced histogram on OBJECT_TYPE because there were more distinct values than the number of buckets specified.  
  • However, on T2, Oracle collected hybrid histograms on OBJECT_NAME and SUBOBJECT and a top-frequency histogram on OBJECT_TYPE.  This type of histogram is only gathered if ESTIMATE_PERCENT is AUTO_SAMPLE_SIZE.
PREFERENCE_OVERRIDES_PARAMETER
------------------------------
TRUE


TABLE_NAME SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1                5000      50000 15:56:58 13/11/2020
T2               50000      50000 15:56:59 13/11/2020


TABLE_NAME COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1         APPLICATION                               1           1 NONE            15:33:06 13/11/2020
           CREATED                                 520           1 NONE            15:33:06 13/11/2020
           CREATED_APPID                             0           0 NONE            15:33:06 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:33:06 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:33:06 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:33:06 13/11/2020
           DUPLICATED                                1           1 NONE            15:33:06 13/11/2020
           EDITIONABLE                               1           1 NONE            15:33:06 13/11/2020
           EDITION_NAME                              0           0 NONE            15:33:06 13/11/2020
           GENERATED                                 2           1 NONE            15:33:06 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:33:06 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:33:06 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:33:06 13/11/2020
           NAMESPACE                                 7           1 NONE            15:33:06 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:33:06 13/11/2020
           OBJECT_NAME                           42274           1 NONE            15:56:58 13/11/2020
           OBJECT_TYPE                              19          10 HEIGHT BALANCED 15:56:58 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:33:06 13/11/2020
           OWNER                                     8           8 FREQUENCY       15:56:58 13/11/2020
           SECONDARY                                 1           1 NONE            15:33:06 13/11/2020
           SHARDED                                   1           1 NONE            15:33:06 13/11/2020
           SHARING                                   4           1 NONE            15:33:06 13/11/2020
           STATUS                                    1           1 NONE            15:33:06 13/11/2020
           SUBOBJECT_NAME                           14           1 NONE            15:56:58 13/11/2020
           TEMPORARY                                 2           1 NONE            15:33:06 13/11/2020
           TIMESTAMP                               602           1 NONE            15:33:06 13/11/2020

T2         APPLICATION                               1           1 NONE            15:56:59 13/11/2020
           CREATED                                 520           1 NONE            15:56:59 13/11/2020
           CREATED_APPID                             0           0 NONE            15:56:59 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:56:59 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:56:59 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:56:59 13/11/2020
           DUPLICATED                                1           1 NONE            15:56:59 13/11/2020
           EDITIONABLE                               1           1 NONE            15:56:59 13/11/2020
           EDITION_NAME                              0           0 NONE            15:56:59 13/11/2020
           GENERATED                                 2           1 NONE            15:56:59 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:56:59 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:56:59 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:56:59 13/11/2020
           NAMESPACE                                 7           1 NONE            15:56:59 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:56:59 13/11/2020
           OBJECT_NAME                           45268          10 HYBRID          15:56:59 13/11/2020
           OBJECT_TYPE                              24          10 TOP-FREQUENCY   15:56:59 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:56:59 13/11/2020
           OWNER                                     8           8 FREQUENCY       15:56:59 13/11/2020
           SECONDARY                                 1           1 NONE            15:56:59 13/11/2020
           SHARDED                                   1           1 NONE            15:56:59 13/11/2020
           SHARING                                   4           1 NONE            15:56:59 13/11/2020
           STATUS                                    1           1 NONE            15:56:59 13/11/2020
           SUBOBJECT_NAME                           77          10 HYBRID          15:56:59 13/11/2020
           TEMPORARY                                 2           1 NONE            15:56:59 13/11/2020
           TIMESTAMP                               602           1 NONE            15:56:59 13/11/2020

Conclusion

I think there is a strong case for enabling PREFERENCE_OVERIDES_PARAMETER at database level on all databases from 12.2.

exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');

If your application explicitly collects statistics, or if you have legacy scripts that collect statistics, and either explicitly specify GATHER_TABLE_STATS parameters, then setting this parameter will revert them to the default.  This is particularly valuable in the case of ESTIMATE_PERCENT as it will default to AUTO_SAMPLE_SIZE and you will get improved row count estimates based on the new 12c algorithm will less work. 

If you don't have this problem in the first place, then enabling PREFERENCE_OVERIDES_PARAMETER at database level will prevent this problem from developing in the future!

If you have tables with requirements for particular statistics collections (e.g. METHOD_OPT, GRANULARITY, etc.) and you don't wish to simply use the defaults, then these variations should be implemented as table statistics preferences. If for some reason that is not possible, PREFERENCE_OVERIDES_PARAMETER can be disabled again at table level, also with a table statistics preference.

No comments :