Introduction
There is a new statistic preference, PREFERENCE_OVERRIDES_PARAMETER available from Oracle 12.2. It allows the DBA to override any parameters specified when gathering statistics in favour of any statistics preferences that are defined. This new preference can be specified at database-level or at table-level, or both.
From the introduction of the cost-based optimizer in Oracle 7, we all had to write scripts to collect statistics. The introduction of the maintenance window in Oracle 10g was supposed to supersede that with a regularly scheduled maintenance window. It still is not uncommon to find systems that rely on custom scripts that collect object statistics. Sometimes, commands to collect statistics are embedded in applications.
It remains perfectly reasonable to choose to collect statistics on certain objects at the exact times when it is most appropriate. For example: just after a table has been populated, or perhaps refresh the statistics on a very large table at a quiet time.
Since 11g, Oracle has provided global and table statistics preferences to specify how statistics are to be collected. This declarative method is generally recommended instead of specifying parameter on calls to DBMS_STATS. The main advantage is consistency. When statistics are collected on a table, they will always be collected in the same way, including during the maintenance window job. However, if scripts and programs still specify parameters when they call DBMS_STATS, those parameters will override the preferences.
Two scenarios in which enabling PREFERENCE_OVERRIDES_PARAMETER would be advantageous immediately come to mind.
- The hash-based algorithm, introduced in 11g, to calculate the number of distinct values on a column only applies if the ESTIMATE_PERCENT parameter is AUTO_SAMPLE_SIZE, which is the default (see How does AUTO_SAMPLE_SIZE work in Oracle Database 12c? by Nigel Bayliss). This new algorithm produces more accurate answers than even a large sample size, and much more quickly because there is no need to sort the sampled data for each column. Therefore, ESTIMATE_PERCENT should no longer be specified.
- I have found applications that specify a METHOD_OPT parameter to collect histograms that do not make sense. For example, PeopleSoft used to use FOR ALL INDEXED COLUMNS SIZE 1 in the code that collected statistics. That would collect column minimum and maximum column values only on index columns, but the column statistics on any unindexed columns simply would not be updated. If scripts collect histograms that should be retained, then those METHOD_OPTs should be defined as a table preference.
See also Overriding DBMS_STATS Parameter Settings by Maria Colgan
Demonstration
The default value of PREFERENCE_OVERRIDES_PARAMETER is false. Therefore, the default is the status quo, parameters override preferences.
exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','FALSE');
I am going to create two tables with 50,000 rows each.
DROP TABLE t1 PURGE;
DROP TABLE t2 PURGE;
CREATE TABLE t1 AS SELECT * FROM all_objects WHERE rownum <= 50000;
CREATE UNIQUE INDEX t1_idx ON t1 (owner, object_type, object_name, subobject_name);
CREATE TABLE t2 AS SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * FROM all_objects WHERE rownum <= 50000;
CREATE UNIQUE INDEX t2_idx ON t2 (owner, object_type, object_name, subobject_name);
@tstats
REM tstats.sql
set pages 99 lines 200 trimspool on autotrace off
column table_name format a10
column column_name format a30
column PREFERENCE_OVERRIDES_PARAMETER format a30
break on report
alter session set nls_date_Format = 'hh24:mi:ss dd/mm/yyyy';
spool tstats
SELECT DBMS_STATS.GET_PREFS('PREFERENCE_OVERRIDES_PARAMETER') AS PREFERENCE_OVERRIDES_PARAMETER
FROM dual;
SELECT table_name, sample_size, num_rows, last_analyzed
FROM user_tables
WHERE table_name in('T1','T2')
ORDER BY 1;
break on table_name skip 1
SELECT table_name, column_name, num_distinct, num_Buckets, histogram, last_analyzed
FROM user_tab_columns
WHERE table_name in('T1','T2')
order by 1,2;
spool off
Real-time statistics were collected on T1. Note that I have column statistics on T1, but no histograms. I suppressed statistics collection on T2.
If I had run this test on an autonomous database then I would have had histograms because _optimizer_gather_stats_on_load_hist=TRUE
PREFERENCE_OVERRIDES_PARAMETER
------------------------------
FALSE
TABLE_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1 50000 50000 15:33:06 13/11/2020
T2
TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1 APPLICATION 1 1 NONE 15:33:06 13/11/2020
CREATED 520 1 NONE 15:33:06 13/11/2020
CREATED_APPID 0 0 NONE 15:33:06 13/11/2020
CREATED_VSNID 0 0 NONE 15:33:06 13/11/2020
DATA_OBJECT_ID 343 1 NONE 15:33:06 13/11/2020
DEFAULT_COLLATION 1 1 NONE 15:33:06 13/11/2020
DUPLICATED 1 1 NONE 15:33:06 13/11/2020
EDITIONABLE 1 1 NONE 15:33:06 13/11/2020
EDITION_NAME 0 0 NONE 15:33:06 13/11/2020
GENERATED 2 1 NONE 15:33:06 13/11/2020
LAST_DDL_TIME 736 1 NONE 15:33:06 13/11/2020
MODIFIED_APPID 0 0 NONE 15:33:06 13/11/2020
MODIFIED_VSNID 0 0 NONE 15:33:06 13/11/2020
NAMESPACE 7 1 NONE 15:33:06 13/11/2020
OBJECT_ID 50000 1 NONE 15:33:06 13/11/2020
OBJECT_NAME 45268 1 NONE 15:33:06 13/11/2020
OBJECT_TYPE 24 1 NONE 15:33:06 13/11/2020
ORACLE_MAINTAINED 2 1 NONE 15:33:06 13/11/2020
OWNER 8 1 NONE 15:33:06 13/11/2020
SECONDARY 1 1 NONE 15:33:06 13/11/2020
SHARDED 1 1 NONE 15:33:06 13/11/2020
SHARING 4 1 NONE 15:33:06 13/11/2020
STATUS 1 1 NONE 15:33:06 13/11/2020
SUBOBJECT_NAME 77 1 NONE 15:33:06 13/11/2020
TEMPORARY 2 1 NONE 15:33:06 13/11/2020
TIMESTAMP 602 1 NONE 15:33:06 13/11/2020
T2 APPLICATION NONE
CREATED NONE
CREATED_APPID NONE
CREATED_VSNID NONE
DATA_OBJECT_ID NONE
DEFAULT_COLLATION NONE
DUPLICATED NONE
EDITIONABLE NONE
EDITION_NAME NONE
GENERATED NONE
LAST_DDL_TIME NONE
MODIFIED_APPID NONE
MODIFIED_VSNID NONE
NAMESPACE NONE
OBJECT_ID NONE
OBJECT_NAME NONE
OBJECT_TYPE NONE
ORACLE_MAINTAINED NONE
OWNER NONE
SECONDARY NONE
SHARDED NONE
SHARING NONE
STATUS NONE
SUBOBJECT_NAME NONE
TEMPORARY NONE
TIMESTAMP NONE
I will now gather statistics on both tables with an explicit sample size and a METHOD_OPT that does not collect statistics on unindexed columns.
EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
@tstats
- I can see that I got sample sizes close to 5000 for each table, and the number of rows is 10 times larger, so it was a 10% sample size.
- There is a unique index on each table on 4 columns. Only the column statistics for those 4 columns were updated. On T1, we can see the column statistics were collected at different times.
TABLE_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1 4937 49370 15:33:46 13/11/2020
T2 5013 50130 15:33:47 13/11/2020
TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1 APPLICATION 1 1 NONE 15:33:06 13/11/2020
CREATED 520 1 NONE 15:33:06 13/11/2020
CREATED_APPID 0 0 NONE 15:33:06 13/11/2020
CREATED_VSNID 0 0 NONE 15:33:06 13/11/2020
DATA_OBJECT_ID 343 1 NONE 15:33:06 13/11/2020
DEFAULT_COLLATION 1 1 NONE 15:33:06 13/11/2020
DUPLICATED 1 1 NONE 15:33:06 13/11/2020
EDITIONABLE 1 1 NONE 15:33:06 13/11/2020
EDITION_NAME 0 0 NONE 15:33:06 13/11/2020
GENERATED 2 1 NONE 15:33:06 13/11/2020
LAST_DDL_TIME 736 1 NONE 15:33:06 13/11/2020
MODIFIED_APPID 0 0 NONE 15:33:06 13/11/2020
MODIFIED_VSNID 0 0 NONE 15:33:06 13/11/2020
NAMESPACE 7 1 NONE 15:33:06 13/11/2020
OBJECT_ID 50000 1 NONE 15:33:06 13/11/2020
OBJECT_NAME 40572 1 NONE 15:33:46 13/11/2020
OBJECT_TYPE 18 10 HEIGHT BALANCED 15:33:46 13/11/2020
ORACLE_MAINTAINED 2 1 NONE 15:33:06 13/11/2020
OWNER 7 7 FREQUENCY 15:33:46 13/11/2020
SECONDARY 1 1 NONE 15:33:06 13/11/2020
SHARDED 1 1 NONE 15:33:06 13/11/2020
SHARING 4 1 NONE 15:33:06 13/11/2020
STATUS 1 1 NONE 15:33:06 13/11/2020
SUBOBJECT_NAME 8 1 NONE 15:33:46 13/11/2020
TEMPORARY 2 1 NONE 15:33:06 13/11/2020
TIMESTAMP 602 1 NONE 15:33:06 13/11/2020
T2 APPLICATION NONE
CREATED NONE
CREATED_APPID NONE
CREATED_VSNID NONE
DATA_OBJECT_ID NONE
DEFAULT_COLLATION NONE
DUPLICATED NONE
EDITIONABLE NONE
EDITION_NAME NONE
GENERATED NONE
LAST_DDL_TIME NONE
MODIFIED_APPID NONE
MODIFIED_VSNID NONE
NAMESPACE NONE
OBJECT_ID NONE
OBJECT_NAME 43403 1 NONE 15:33:47 13/11/2020
OBJECT_TYPE 14 10 HEIGHT BALANCED 15:33:47 13/11/2020
ORACLE_MAINTAINED NONE
OWNER 7 7 FREQUENCY 15:33:47 13/11/2020
SECONDARY NONE
SHARDED NONE
SHARING NONE
STATUS NONE
SUBOBJECT_NAME 10 1 NONE 15:33:47 13/11/2020
TEMPORARY NONE
TIMESTAMP NONE
Now I am going to enable PREFERENCE_OVERRIDES_PARAMETER at database-level, but I am also going to disable it for T1, so the parameters still override the preference. I would like histograms on the indexed columns on T2, so I am going to specify a METHOD_OPT table preference. If I mix FOR ALL COLUMNS and FOR ALL INDEXED COLUMNS, whichever is specified first will be overridden by the second. So instead, I must explicitly list the indexed columns.
connect / as sysdba
exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');
connect scott/tiger
exec dbms_stats.set_table_prefs(user,'T1','PREFERENCE_OVERRIDES_PARAMETER','FALSE');
exec dbms_stats.set_table_prefs(user,'T2','METHOD_OPT'
,'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 10 owner, object_type, object_name, subobject_name');
EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
@tstats
- For T2, the sample size and the number of rows are both 50000. The new NDV algorithm has produced the correct answer. I didn't get any histograms other than on the columns I specified because there isn't any table usage yet.
- T1 used the 10% sample size and again only the statistics on the indexed columns were updated.
- I didn't get a histogram on T1 on OBJECT_NAME and SUBOBJECT_NAME, and I got a height-balanced histogram on OBJECT_TYPE because there were more distinct values than the number of buckets specified.
- However, on T2, Oracle collected hybrid histograms on OBJECT_NAME and SUBOBJECT and a top-frequency histogram on OBJECT_TYPE. This type of histogram is only gathered if ESTIMATE_PERCENT is AUTO_SAMPLE_SIZE.
PREFERENCE_OVERRIDES_PARAMETER
------------------------------
TRUE
TABLE_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1 5000 50000 15:56:58 13/11/2020
T2 50000 50000 15:56:59 13/11/2020
TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1 APPLICATION 1 1 NONE 15:33:06 13/11/2020
CREATED 520 1 NONE 15:33:06 13/11/2020
CREATED_APPID 0 0 NONE 15:33:06 13/11/2020
CREATED_VSNID 0 0 NONE 15:33:06 13/11/2020
DATA_OBJECT_ID 343 1 NONE 15:33:06 13/11/2020
DEFAULT_COLLATION 1 1 NONE 15:33:06 13/11/2020
DUPLICATED 1 1 NONE 15:33:06 13/11/2020
EDITIONABLE 1 1 NONE 15:33:06 13/11/2020
EDITION_NAME 0 0 NONE 15:33:06 13/11/2020
GENERATED 2 1 NONE 15:33:06 13/11/2020
LAST_DDL_TIME 736 1 NONE 15:33:06 13/11/2020
MODIFIED_APPID 0 0 NONE 15:33:06 13/11/2020
MODIFIED_VSNID 0 0 NONE 15:33:06 13/11/2020
NAMESPACE 7 1 NONE 15:33:06 13/11/2020
OBJECT_ID 50000 1 NONE 15:33:06 13/11/2020
OBJECT_NAME 42274 1 NONE 15:56:58 13/11/2020
OBJECT_TYPE 19 10 HEIGHT BALANCED 15:56:58 13/11/2020
ORACLE_MAINTAINED 2 1 NONE 15:33:06 13/11/2020
OWNER 8 8 FREQUENCY 15:56:58 13/11/2020
SECONDARY 1 1 NONE 15:33:06 13/11/2020
SHARDED 1 1 NONE 15:33:06 13/11/2020
SHARING 4 1 NONE 15:33:06 13/11/2020
STATUS 1 1 NONE 15:33:06 13/11/2020
SUBOBJECT_NAME 14 1 NONE 15:56:58 13/11/2020
TEMPORARY 2 1 NONE 15:33:06 13/11/2020
TIMESTAMP 602 1 NONE 15:33:06 13/11/2020
T2 APPLICATION 1 1 NONE 15:56:59 13/11/2020
CREATED 520 1 NONE 15:56:59 13/11/2020
CREATED_APPID 0 0 NONE 15:56:59 13/11/2020
CREATED_VSNID 0 0 NONE 15:56:59 13/11/2020
DATA_OBJECT_ID 343 1 NONE 15:56:59 13/11/2020
DEFAULT_COLLATION 1 1 NONE 15:56:59 13/11/2020
DUPLICATED 1 1 NONE 15:56:59 13/11/2020
EDITIONABLE 1 1 NONE 15:56:59 13/11/2020
EDITION_NAME 0 0 NONE 15:56:59 13/11/2020
GENERATED 2 1 NONE 15:56:59 13/11/2020
LAST_DDL_TIME 736 1 NONE 15:56:59 13/11/2020
MODIFIED_APPID 0 0 NONE 15:56:59 13/11/2020
MODIFIED_VSNID 0 0 NONE 15:56:59 13/11/2020
NAMESPACE 7 1 NONE 15:56:59 13/11/2020
OBJECT_ID 50000 1 NONE 15:56:59 13/11/2020
OBJECT_NAME 45268 10 HYBRID 15:56:59 13/11/2020
OBJECT_TYPE 24 10 TOP-FREQUENCY 15:56:59 13/11/2020
ORACLE_MAINTAINED 2 1 NONE 15:56:59 13/11/2020
OWNER 8 8 FREQUENCY 15:56:59 13/11/2020
SECONDARY 1 1 NONE 15:56:59 13/11/2020
SHARDED 1 1 NONE 15:56:59 13/11/2020
SHARING 4 1 NONE 15:56:59 13/11/2020
STATUS 1 1 NONE 15:56:59 13/11/2020
SUBOBJECT_NAME 77 10 HYBRID 15:56:59 13/11/2020
TEMPORARY 2 1 NONE 15:56:59 13/11/2020
TIMESTAMP 602 1 NONE 15:56:59 13/11/2020
Conclusion
I think there is a strong case for enabling PREFERENCE_OVERIDES_PARAMETER at database level on all databases from 12.2.
exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');
If your application explicitly collects statistics, or if you have legacy scripts that collect statistics, and either explicitly specify GATHER_TABLE_STATS parameters, then setting this parameter will revert them to the default. This is particularly valuable in the case of ESTIMATE_PERCENT as it will default to AUTO_SAMPLE_SIZE and you will get improved row count estimates based on the new 12c algorithm will less work.
If you don't have this problem in the first place, then enabling PREFERENCE_OVERIDES_PARAMETER at database level will prevent this problem from developing in the future!
If you have tables with requirements for particular statistics collections (e.g. METHOD_OPT, GRANULARITY, etc.) and you don't wish to simply use the defaults, then these variations should be implemented as table statistics preferences. If for some reason that is not possible, PREFERENCE_OVERIDES_PARAMETER can be disabled again at table level, also with a table statistics preference.
No comments :
Post a Comment