Monday, December 18, 2017

Hints, Patches, Force Matching and SQL Profiles

Sometimes, when all else fails, it is necessary to add hints to a piece of SQL in order to make the optimizer use a particular execution plan. We might directly add the hint to the code. However, even if it is possible to access the code directly, that may not be a good idea. In the future, if we need to change the hint, possibly due to a change in optimizer behaviour on database upgrade, then we would again need to make a code change.
Instead, we could look at Oracle's plan stability technologies.  There are 4 plan stability features in the database. They are different, but conceptually they all involve associating a set of hints with a statement that will be picked up by the optimizer at runtime.
  • SQL Outlines can be collected by the database at runtime. They can exactly match a SQL text or force match. However, they are deprecated from 11g. 
  • SQL Baselines can be collected by the database at runtime either on request or automatically. They feed real experience of previous executions of a statement back into subsequent executions. They match by SQL_ID. 
  • SQL Patches are a mechanism for manually inserting specific hints into a specific SQL ID 
  • SQL Profiles are also a mechanism for inserting specific hints into a specific SQL statement, but can also do forced matching. We are probably most used to creating profiles with Carlos Sierra's coe_xfr_sql_profile.sql script (part of Oracle support's SQLTXPLAIN), but as this blog will show that is not the only option. However, they require the database tuning pack to be licenced, which is not the case with the other mechanisms. 
There are several posts on Oracle's optimizer team that blog explains how to use SQL patches to inject a hint or disable an embedded hint.
However, the SQL patches work by attaching to a specific SQL_ID. If your application has literal values that change instead of bind variables you will not be able to inject hints with SQL patches.
The following example SQL was generated by PeopleSoft GL reporting tool, nVision. The code is dynamically generated by the tool, and there is no way to add a hint directly. The generated code contains literal values, in particular, the SELECTOR_NUM will be different for every execution. Therefore the SQL_ID will be different for every execution. Therefore, it is not possible to use a SQL Patch to inject a hint.
SELECT L2.TREE_NODE_NUM,L3.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 
FROM PS_XX_SUM_CONSOL_VW A, PSTREESELECT05 L2, PSTREESELECT10 L3 
WHERE A.LEDGER='S_USMGT' 
AND A.FISCAL_YEAR=2017 
AND A.ACCOUNTING_PERIOD BETWEEN 0 AND 12 
AND (A.DEPTID BETWEEN 'A0000' AND 'A8999' OR A.DEPTID BETWEEN 'B0000' AND 'B9149' 
OR A.DEPTID='B9156' OR A.DEPTID='B9158' OR A.DEPTID BETWEEN 'B9165' AND 'B9999' 
OR A.DEPTID BETWEEN 'C0000' AND 'C9999' OR A.DEPTID BETWEEN 'D0000' AND 'D9999' 
OR A.DEPTID BETWEEN 'G0000' AND 'G9999' OR A.DEPTID BETWEEN 'H0000' AND 'H9999' 
OR A.DEPTID='B9150' OR A.DEPTID=' ') 
AND L2.SELECTOR_NUM=10228 
AND A.BUSINESS_UNIT=L2.RANGE_FROM_05 
AND L3.SELECTOR_NUM=10231 
AND A.ACCOUNT=L3.RANGE_FROM_10 
AND A.CHARTFIELD1='0012345' 
AND A.CURRENCY_CD='GBP' 
GROUP BY L2.TREE_NODE_NUM,L3.TREE_NODE_NUM
/
If only the literal values differ, then the SQL statements will have the same force matching signature and one SQL profile will match all similar statements. Although, statements will not force match if the number of predicates changes. Nonetheless, I can inject hints with a SQL profile. This is the plan I get initially without a profile. It doesn't perform well, and it is not the plan I want.
Plan hash value: 808840077
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |                   |       |       | 10408 (100)|          |       |       |
|   1 |  HASH GROUP BY                                |                   |   517 | 50666 | 10408   (1)| 00:00:01 |       |       |
|   2 |   HASH JOIN                                   |                   |   517 | 50666 | 10407   (1)| 00:00:01 |       |       |
|   3 |    PARTITION RANGE SINGLE                     |                   |   731 | 13158 |     3   (0)| 00:00:01 | 10228 | 10228 |
|   4 |     INDEX FAST FULL SCAN                      | PSAPSTREESELECT05 |   731 | 13158 |     3   (0)| 00:00:01 | 10228 | 10228 |
|   5 |    HASH JOIN                                  |                   |   518 | 41440 | 10404   (1)| 00:00:01 |       |       |
|   6 |     PARTITION RANGE SINGLE                    |                   |   249 |  5727 |     2   (0)| 00:00:01 | 10231 | 10231 |
|   7 |      INDEX FAST FULL SCAN                     | PSAPSTREESELECT10 |   249 |  5727 |     2   (0)| 00:00:01 | 10231 | 10231 |
|   8 |     PARTITION RANGE ITERATOR                  |                   |  7785 |   433K| 10402   (1)| 00:00:01 |    28 |    40 |
|   9 |      TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_X_LEDGER_ACCTS |  7785 |   433K| 10402   (1)| 00:00:01 |    28 |    40 |
|  10 |       SORT CLUSTER BY ROWID BATCHED           |                   |  5373 |       |  5177   (1)| 00:00:01 |       |       |
|  11 |        INDEX SKIP SCAN                        | PS_X_LEDGER_ACCTS |  5373 |       |  5177   (1)| 00:00:01 |    28 |    40 |
-----------------------------------------------------------------------------------------------------------------------------------
These are the hints I want to introduce. I want to force materialize view rewrite, invoke parallelism if the statement is estimated to run for at least 2 seconds, and use a Bloom filter on the materialized view.
SELECT /*+OPT_PARAM('parallel_degree_policy','AUTO') OPT_PARAM('parallel_min_time_threshold',2) 
OPT_PARAM('parallel_degree_limit',4) REWRITE PX_JOIN_FILTER(PS_XX_SUM_GCNSL_MV)*/…
This application produces many statements that I may want to control with a profile, but with forced matching this comes down to a not unmanageable number.  I have created a data-driven framework to create the profiles. I have created working storage table into which I will populate it with details of each force matching signature for which I want to create a profile.
CREATE TABLE dmk_fms_profiles
(force_matching_signature    NUMBER NOT NULL
,sql_id                      VARCHAR2(13) 
,plan_hash_value             NUMBER
,module                      VARCHAR2(64)
,report_id                   VARCHAR2(32) /*Application Specific*/
,tree_list                   CLOB         /*Application Specific*/
,sql_profile_name            VARCHAR2(30)
,parallel_min_time_threshold NUMBER
,parallel_degree_limit       NUMBER
,other_hints                 CLOB
,delete_profile              VARCHAR2(1)
,sql_text                    CLOB
,CONSTRAINT dmk_fms_profiles_pk PRIMARY KEY (force_matching_signature)
,CONSTRAINT dmk_fms_profiles_u1 UNIQUE (sql_id)
,CONSTRAINT dmk_fms_profiles_u2 UNIQUE (sql_profile_name)
)
/
Using conditional parallelism with the PARALLEL_MIN_TIME_THRESHOLD, but limited with PARALLEL_DEGREE_LIMIT is a frequently used tactic with this application, so I have specified columns for those hints, otherwise, hints are injected via a string. I identified the problematic SQL by analysis with ASH, and hence I also obtained the FORCE_MATCHING_SIGNATURE for them. The metadata is keyed by FORCE_MATCHING_SIGNATURE. I am specifying an additional hint to force materialized view rewrite and use a Bloom filter on the ledger table. I have specified a meaningful name for the SQL profile.
INSERT INTO dmk_fms_profiles (force_matching_signature, parallel_min_time_threshold, parallel_degree_limit, other_hints, sql_profile_name) 
VALUES (16625752171077499412, 1, 4, 'REWRITE PX_JOIN_FILTER(PS_FT_SUM_GCNSL_MV)', 'NVS_GBGL123I_BU_CONSOL_ACCOUNT');
COMMIT;
Profiles are created on the text of a SQL rather than the SQL_ID or FORCE_MATCHING_SIGNATURE directly. Therefore the SQL_TEXT must be extracted from the AWR, so this method also requires that the statement has been captured by an AWR snapshot.
UPDATE dmk_fms_profiles a
SET    (module, action, sql_id, plan_hash_value, sql_text)
=      (SELECT s.module, s.action, s.sql_id, s.plan_hash_value, t.sql_text
        FROM   dba_hist_sqlstat s
        ,      dba_hist_sqltext t
        WHERE  t.dbid = s.dbid
        AND    t.sql_id = s.sql_id
        AND    s.force_matching_signature = a.force_matching_signature
        AND    s.snap_id = (
   SELECT MAX(s1.snap_id)
  FROM   dba_hist_sqlstat s1
  WHERE  s1.force_matching_signature = a.force_matching_signature
  AND    s1.module = 'RPTBOOK'     /*Application Specific*/
   AND    s1.action LIKE 'PI=%:%:%' /*Application Specific*/)
  AND    s.module = 'RPTBOOK'             /*Application Specific*/
  AND    s.action LIKE 'PI=%:%:%'         /*Application Specific*/
  AND    ROWNUM = 1)
WHERE sql_id IS NULL
/

MERGE INTO dmk_fms_profiles u
USING (
SELECT a.sql_id, a.force_matching_signature, p.name
FROM   dmk_fms_profiles a
,      dba_sql_profiles p
WHERE  p.signature = a.force_matching_signature
) s
ON (s.force_matching_signature = u.force_matching_signature)
WHEN MATCHED THEN UPDATE
SET u.sql_profile_name = s.name
/
Columns REPORT_ID and TREE_LIST contain application specific information extracted from the application instrumentation and logging. They only mean something in the context of the application.
/*Application Specific - extract report ID from ACTION*/
UPDATE dmk_fms_profiles a
SET    report_id = substr(regexp_substr(s.action,':([A-Za-z0-9_-])+',1,1),2)
WHERE  report_id IS NULL
AND    action IS NOT NULL
/
Now I can produce a simple report of the metadata in order to see what profiles should be created.
column sql_text word_wrapped on format a110
column module format a8
column report_id heading 'nVision|Report ID'
column tree_list word_wrapped on format a20
column plan_hash_value             heading 'SQL Plan|Hash Value'         format 9999999999
column parallel_min_time_threshold heading 'Parallel|Min Time|Threshold' format 999
column parallel_degree_limit       heading 'Parallel|Degree|Limit'       format 999
set long 500
SELECT * FROM dmk_fms_profiles
/

                                          SQL Plan                                                                           
FORCE_MATCHING_SIGNATURE SQL_ID         Hash Value MODULE   ACTION                                                           
------------------------ ------------- ----------- -------- ---------------------------------------------------------------- 
                                                                                      Parallel Parallel
nVision                                                                               Min Time   Degree
Report ID                        TREE_LIST            SQL_PROFILE_NAME               Threshold    Limit D
-------------------------------- -------------------- ------------------------------ --------- -------- -
OTHER_HINTS
--------------------------------------------------------------------------------
SQL_TEXT
--------------------------------------------------------------------------------------------------------------
    12803175998948432502 5pzxhha3392cs   988048519 RPTBOOK  PI=3186222:USGL233I:10008                                        
USGL233I                         BU_GAAP_CONSOL,      NVS_GBGL123I_BU_CONSOL_ACCOUNT                   1        4
                                 GAAP_ACCOUNT
REWRITE PX_JOIN_FILTER(PS_XX_SUM_GCNSL_MV)
SELECT L2.TREE_NODE_NUM,A.ACCOUNT,SUM(A.POSTED_TOTAL_AMT) FROM PS_LEDGER A, PSTREESELECT05 L2, PSTREESELECT10 L3
WHERE A.LEDGER='S_GBMGT' AND A.FISCAL_YEAR=2017 AND A.ACCOUNTING_PERIOD BETWEEN 0 AND 12 AND (A.DEPTID BETWEEN
'A0000' AND 'A8999' OR A.DEPTID BETWEEN 'B0000' AND 'B9149' OR A.DEPTID='B9156' OR A.DEPTID='B9158' OR A.DEPTID
BETWEEN 'B9165' AND 'B9999' OR A.DEPTID BETWEEN 'C0000' AND 'C9999' OR A.DEPTID BETWEEN 'D0000' AND 'D9999' OR
A.DEPTID BETWEEN 'G0000' AND 'G9999' OR A.DE
Next, this PL/SQL block will create or recreate SQL profiles from the metadata. The various hints can be concatenated into a single string and passed as a parameter to SQLPROF_ATTR. The SQL text is passed as a parameter when the profile is created.
set serveroutput on
DECLARE
  l_signature NUMBER;
  h       SYS.SQLPROF_ATTR;
  e_no_sql_profile EXCEPTION;
  PRAGMA EXCEPTION_INIT(e_no_sql_profile, -13833);
  l_description CLOB;
BEGIN

FOR i IN (
  SELECT f.*, s.name
  FROM   dmk_fms_profiles f
    LEFT OUTER JOIN dba_sql_profiles s
    ON f.force_matching_signature = s.signature
) LOOP

  BEGIN 
    IF i.name IS NOT NULL AND i.delete_profile = 'Y' THEN
      dbms_sqltune.drop_sql_profile(name => i.name);
    END IF;
    EXCEPTION WHEN e_no_sql_profile THEN NULL;
  END;

  IF i.delete_profile = 'Y' THEN 
    NULL;
  ELSIF i.sql_text IS NOT NULL THEN
    h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
CASE WHEN i.parallel_min_time_threshold>=0 THEN 'OPT_PARAM(''parallel_degree_policy'',''AUTO'') ' END||
CASE WHEN i.parallel_degree_limit      >=0 THEN 'OPT_PARAM(''parallel_degree_limit'','      ||i.parallel_degree_limit      ||') ' END||
CASE WHEN i.parallel_min_time_threshold>=0 THEN 'OPT_PARAM(''parallel_min_time_threshold'','||i.parallel_min_time_threshold||') ' END||
i.other_hints,
q'[END_OUTLINE_DATA]');

    l_signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(i.sql_text);
    l_description := 'coe nVision '||i.report_id||' '||i.tree_list||' '||i.force_matching_signature||'='||l_signature;
    dbms_output.put_line(i.sql_profile_name||' '||l_description);

    DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => i.sql_text,
profile     => h,
name        => i.sql_profile_name,
description => l_description,
category    => 'DEFAULT',
validate    => TRUE,
replace     => TRUE,
force_match => TRUE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );

  END IF;
END LOOP;
END;
/
I can verify that the profile has been created, and the hints that it contains, thus:
SELECT profile_name,
       xmltype(comp_data) as xmlval
FROM   dmk_fms_profiles p
,      dbmshsxp_sql_profile_attr  x
WHERE  x.profile_name = p.sql_profile_name
AND    p.status = 'ENABLED'
ORDER BY 1
/

PROFILE_NAME                                                                                                                                                                                            
------------------------------
XMLVAL                                                                                                                                                                                                  
------------------------------------------------------------------------------------------------
NVS_GBGL123I_BU_CONSOL_ACCOUNT
<![CDATA[BEGIN_OUTLINE_DATA]]>                                                                                                                                                           
  <![CDATA[OPT_PARAM('parallel_degree_policy','AUTO') OPT_PARAM('parallel_degree_limit',4) OPT_PARAM('parallel_min_time_threshold',1) REWRITE PX_JOIN_FILTER(PS_FT_SUM_GCNSL_MV)]]>                                           
  <![CDATA[END_OUTLINE_DATA]]>
And now when the application runs, I get the plan that I wanted.
  • The query runs in parallel.
  • The SQL is rewritten to use materialized view.
  • There are no indexes so it must full scan it.
  • It generates a bloom filter from PSTREESELECT10 and applies it to the materialized view.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                         | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                  |                    |       |       |  2219 (100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR                                   |                    |       |       |         |             |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)                             | :TQ10004           |   111 |  9879 |  2219   (6)| 00:00:01 |       |       |  Q1,04 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                                  |                    |   111 |  9879 |  2219   (6)| 00:00:01 |       |       |  Q1,04 | PCWP |            |
|   4 |     PX RECEIVE                                    |                    |   111 |  9879 |  2219   (6)| 00:00:01 |       |       |  Q1,04 | PCWP |            |
|   5 |      PX SEND HASH                                 | :TQ10003           |   111 |  9879 |  2219   (6)| 00:00:01 |       |       |  Q1,03 | P->P | HASH       |
|   6 |       HASH GROUP BY                               |                    |   111 |  9879 |  2219   (6)| 00:00:01 |       |       |  Q1,03 | PCWP |            |
|   7 |        HASH JOIN                                  |                    |   536 | 47704 |  2218   (6)| 00:00:01 |       |       |  Q1,03 | PCWP |            |
|   8 |         PX RECEIVE                                |                    |   536 | 38056 |  2215   (6)| 00:00:01 |       |       |  Q1,03 | PCWP |            |
|   9 |          PX SEND HYBRID HASH                      | :TQ10002           |   536 | 38056 |  2215   (6)| 00:00:01 |       |       |  Q1,02 | P->P | HYBRID HASH|
|  10 |           STATISTICS COLLECTOR                    |                    |       |       |         |             |       |       |  Q1,02 | PCWC |            |
|  11 |            HASH JOIN                              |                    |   536 | 38056 |  2215   (6)| 00:00:01 |       |       |  Q1,02 | PCWP |            |
|  12 |             BUFFER SORT                           |                    |       |       |         |             |       |       |  Q1,02 | PCWC |            |
|  13 |              JOIN FILTER CREATE                   | :BF0000            |   236 |  3776 |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWP |            |
|  14 |               PX RECEIVE                          |                    |   236 |  3776 |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWP |            |
|  15 |                PX SEND BROADCAST                  | :TQ10000           |   236 |  3776 |     2   (0)| 00:00:01 |       |       |        | S->P | BROADCAST  |
|  16 |                 PARTITION RANGE SINGLE            |                    |   236 |  3776 |     2   (0)| 00:00:01 | 36774 | 36774 |        |      |            |
|  17 |                  INDEX FAST FULL SCAN             | PSAPSTREESELECT10  |   236 |  3776 |     2   (0)| 00:00:01 | 36774 | 36774 |        |      |            |
|  18 |             JOIN FILTER USE                       | :BF0000            |  8859 |   475K|  2213   (6)| 00:00:01 |       |       |  Q1,02 | PCWP |            |
|  19 |              PX BLOCK ITERATOR                    |                    |  8859 |   475K|  2213   (6)| 00:00:01 |    29 |    41 |  Q1,02 | PCWC |            |
|  20 |               MAT_VIEW REWRITE ACCESS STORAGE FULL| PS_XX_SUM_GCNSL_MV |  8859 |   475K|  2213   (6)| 00:00:01 |    29 |    41 |  Q1,02 | PCWP |      
|  21 |         BUFFER SORT                               |                    |       |       |         |             |       |       |  Q1,03 | PCWC |            |
|  22 |          PX RECEIVE                               |                    |   731 | 13158 |     3   (0)| 00:00:01 |       |       |  Q1,03 | PCWP |            |
|  23 |           PX SEND HYBRID HASH                     | :TQ10001           |   731 | 13158 |     3   (0)| 00:00:01 |       |       |        | S->P | HYBRID HASH|
|  24 |            PARTITION RANGE SINGLE                 |                    |   731 | 13158 |     3   (0)| 00:00:01 | 36773 | 36773 |        |      |            |
|  25 |             INDEX FAST FULL SCAN                  | PSAPSTREESELECT05  |   731 | 13158 |     3   (0)| 00:00:01 | 36773 | 36773 |        |      |            |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Conclusion SQL 

Profiles can be used in much the same way as SQL Patches to introduce hints into application SQL without changing the code, the difference being that SQL Profiles can force match SQL. However, SQL Profiles do require the Tuning pack to be licenced, whereas SQL Patches do not. Some hints do not work because the profile is applied to late in the parse process. For example, if you use cursor sharing then the CURSOR_SHARING_EXACT hint will not take effect.

Friday, December 08, 2017

Changing Physical Index Attributes without Down Time

Normally, we make an index invisible before we drop it, in order to see whether we can manage without it, and if not we can make it visible again.  In this blog, I will demonstrate how to use index invisibility to introduce an index change that I cannot do with an online operation.  I am also able to reverse it immediately, whereas an on-line operation would take time.

Problem Statement

I have a large partitioned table, but the primary key index on it was not partitioned.  Testing has shown that performance would improve if the index was locally partitioned.  It is not possible to introduce the partitioning by rebuilding the index online.  I cannot afford the downtime to drop and rebuild the index, and anyway I want an easy way back to the original state in case of a problem.

Demonstration

I encountered this problem in a PeopleSoft system with a unique index, but here I will demonstrate it with a primary key constraint also.  I will create and populate a simple range partitioned table with a primary key constraint.
DROP TABLEt PURGE;

CREATE TABLE t
(a number not null
,b number not null
,c number
,d number) 
partition by range (a) interval (42)
(partition values less than (42))
/
ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY (a,b)
/
TRUNCATE TABLE t
/
INSERT INTO t
WITH x AS (select rownum n from dual connect by level <= 100)
SELECT a.n, b.n, a.n*b.n, a.n+b.n
FROM x a, x b
/
Note that table is partitioned, but the unique index is not. I haven't had to explicitly build it without partitioning.
set long 5000
select dbms_metadata.get_ddl('INDEX','T_PK')
from dual
/
CREATE UNIQUE INDEX "SCOTT"."T_PK" ON "SCOTT"."T" ("A", "B")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
TABLESPACE "USERS"
I cannot just create the new index because the column list is already indexed.
CREATE UNIQUE INDEX SCOTT.T_PK_NEW ON SCOTT.T (A, B)
                                               *
ERROR at line 1:
ORA-01408: such column list already indexed
So, now I will create a new unique index on the same columns, but I will create it invisible. I will also create it online so that I do not block the application from performing DML on the table while the index is built.
CREATE UNIQUE INDEX SCOTT.T_PK_NEW ON SCOTT.T (A, B)
LOCAL INVISIBLE ONLINE
/
So now I have two unique indexes, one visible, one invisible
set autotrace off
column constraint_name format a20
column table_name format a10
column index_name format a10
SELECT index_name, visibility FROM user_indexes WHERE table_name = 'T'
/

INDEX_NAME VISIBILIT
---------- ---------
T_PK       VISIBLE
T_PK_NEW   INVISIBLE
I cannot make the new index visible while the original index is also visible because I cannot have two unique indexes
ALTER INDEX SCOTT.T_PK_NEW VISIBLE
/
*
ERROR at line 1:
ORA-14147: There is an existing VISIBLE index defined on the same set of columns.
Instead, I have to make the original index invisible first. However, even if both indexes are invisible the unique constraint is still enforced.
ALTER INDEX SCOTT.T_PK INVISIBLE
/
INSERT INTO t VALUES (1,2,3,4)
/
*
ERROR at line 1:
ORA-00001: unique constraint (SCOTT.T_PK) violated

ALTER INDEX SCOTT.T_PK_NEW VISIBLE
/
Now my new index is visible.
SELECT index_name, visibility FROM user_indexes WHERE table_name = 'T'
/

INDEX_NAME VISIBILIT
---------- ---------
T_PK       INVISIBLE
T_PK_NEW   VISIBLE
The constraint still points to the original index.
INSERT INTO t VALUES(1,2,3,4)
/
*
ERROR at line 1:
ORA-00001: unique constraint (SCOTT.T_PK) violated

SELECT constraint_name, table_name, index_name
FROM   user_constraints
WHERE  constraint_name = 'T_PK'
/

CONSTRAINT_NAME      TABLE_NAME INDEX_NAME
-------------------- ---------- ----------
T_PK                 T          T_PK
However, queries now use the new index.
set autotrace on lines 160
SELECT * FROM t WHERE a = 1 and b=2
/
Plan hash value: 3024084754
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |          |     1 |    52 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE            |          |     1 |    52 |     2   (0)| 00:00:01 |     1 |     1 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| T        |     1 |    52 |     2   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX UNIQUE SCAN               | T_PK_NEW |     1 |       |     1   (0)| 00:00:01 |     1 |     1 |
---------------------------------------------------------------------------------------------------------------
The original index can now be removed. However, unique indexes used by primary key constraints cannot be dropped directly. Instead the modifying the primary key constraint drops the original index.
ALTER TABLE scott.t 
MODIFY CONSTRAINT t_pk USING INDEX t_pk_new
/

SELECT constraint_name, table_name, index_name
FROM   user_constraints
WHERE  constraint_name = 'T_PK'
/
CONSTRAINT_NAME      TABLE_NAME INDEX_NAME
-------------------- ---------- ----------
T_PK                 T          T_PK_NEW

SELECT index_name, visibility FROM user_indexes WHERE table_name = 'T'
/

INDEX_NAME VISIBILIT
---------- ---------
T_PK_NEW   VISIBLE
Finally, the new index can be renamed.
ALTER INDEX t_pk_new RENAME TO t_pk
/

Tuesday, January 10, 2017

Removing Unnecessary Indexes: 3.Cost of Maintaining Indexes

This is the third post in a series about unnecessary indexes and some of the challenges that they present.
In this post I want to look at how to quantify the overhead of index maintenance, and estimate the benefit of removing redundant indexes.
Let’s conduct an experiment. I will set up the same table and indexes as in the previous posts in this series, but this time I will put them into the RECYCLE buffer pool, to which I have allocated the minimum possible 4M of memory, and I will allocate 25% free space to the indexes. The idea is not to benefit from the buffer cache, but suffer the pain of the physical I/O.
CREATE TABLE t (a NUMBER, b NUMBER, c NUMBER, d NUMBER
,CONSTRAINT t_pk PRIMARY KEY (a) USING INDEX STORAGE (BUFFER_POOL RECYCLE)
) STORAGE(BUFFER_POOL RECYCLE);

CREATE INDEX t_ab  ON t (a,b)   STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;
CREATE INDEX t_bc  ON t (b,c)   STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;
CREATE INDEX t_bcd ON t (b,c,d) STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;

INSERT /*+APPEND*/ INTO t
WITH x AS (
SELECT  rownum-1 n FROM DUAL connect by level <= 1E5)
SELECT  n
, MOD(n,100)
, ROUND(MOD(n,100),-1)
, dbms_random.value(1,100)
FROM   x
/
CREATE INDEX T_BCD_ROUND ON T (B, C, ROUND(D,0)) STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;
CREATE INDEX T_BCD_DESC  ON T (B, C, D DESC)     STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;

EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 1 (B,C)');
The table and indexes really too large to fit in the recycle buffer pool, so there will be physical read as blocks are loaded into the buffer cache, and physical write as dirty blocks are pushed out.
SELECT s.owner, s.segment_name
,      s.blocks, s.bytes/1024/1024 Mb
,      t.blocks
FROM   dba_segments s
,      dba_tables t
WHERE  t.table_name = 'T'
AND    s.segment_type = 'TABLE'
AND    s.segment_name = t.table_name
/

OWNER      SEGMENT_NAME     BLOCKS         MB     BLOCKS
---------- ------------ ---------- ---------- ----------
SYSADM     T                   640          5        543

SELECT  s.owner, s.segment_name
,       s.blocks, s.bytes/1024/1024 Mb
,       i.leaf_blocks
FROM    dba_segments s
,       dba_indexes i
WHERE   s.owner = i.owner
AND     s.segment_name = i.index_name
AND     i.table_name = 'T'
/

OWNER      SEGMENT_NAME     BLOCKS         MB LEAF_BLOCKS
---------- ------------ ---------- ---------- -----------
SYSADM     T_PK                256          2         187
SYSADM     T_AB                256          2         237
SYSADM     T_BC                256          2         213
SYSADM     T_BCD               512          4         488
SYSADM     T_BCD_ROUND         384          3         336
SYSADM     T_BCD_DESC          768          6         675
Now I will run a PL/SQL loop that randomly updates rows in the table with random values, 500,000 times. The random nature of the update will minimize the benefit of the caching.  The database will continuously have to load blocks from disk back into buffer cache.
DECLARE 
  l_counter INTEGER := 0;
  l_t1 DATE;
  l_r1 INTEGER;
  l_r2 INTEGER;
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
BEGIN
  dbms_application_info.read_module(l_module, l_action);
  dbms_application_info.set_module('REDUNDANT','UPDATE TEST');
  l_t1 := SYSDATE + &&runtime/86400;
  WHILE l_t1 >= SYSDATE AND l_counter < 5e5 LOOP
    l_r1 := round(dbms_random.value(1,100),0);
    l_r2 := round(dbms_random.value(1,100),0);
    UPDATE t
    SET    c = l_r2
    ,      d = ROUND(l_r2,-1)
    WHERE  a = l_r1;
    l_counter := l_counter + 1;
    COMMIT;
  END LOOP; 
  dbms_output.put_line(l_counter||' rows updated');
  dbms_application_info.set_module(l_module, l_action);
END;
/
The activity in the buffer cache is visible via v$buffer_pool_statistics.
select * from v$buffer_pool_statistics where name = 'RECYCLE';
Before:
        ID NAME                 BLOCK_SIZE  SET_MSIZE  CNUM_REPL CNUM_WRITE   CNUM_SET    BUF_GOT  SUM_WRITE   SUM_SCAN
---------- -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
FREE_BUFFER_WAIT WRITE_COMPLETE_WAIT BUFFER_BUSY_WAIT FREE_BUFFER_INSPECTED DIRTY_BUFFERS_INSPECTED DB_BLOCK_CHANGE
---------------- ------------------- ---------------- --------------------- ----------------------- ---------------
DB_BLOCK_GETS CONSISTENT_GETS PHYSICAL_READS PHYSICAL_WRITES
------------- --------------- -------------- ---------------
         2 RECYCLE                    8192        490        490          0        490     482786     758695          0
           65466               22517                0                889060                  612270        49971577
     71349961        18260691         385878          758695
After:
         2 RECYCLE                    8192        490        459         31        490     518646     853441          0
           73457               25942                0                981089                  688777        57003938
     81763622        19260981         413463          853441
We can see there have been physical reads and writes on objects in the recycle pool.
This is the execution plan of the statement in the PL/SQL loop. The update occurs in line 1 of the plan.
SQL_ID  3f305xbd6ts1d, child number 1
-------------------------------------
UPDATE T SET C = :B2 , D = ROUND(:B2 ,-1) WHERE A = :B1

Plan hash value: 795017363
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  UPDATE            | T    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| T_PK |     1 |    33 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Profile DML Overhead

Now, I will profile the recent ASH data for my test (filtered by module and action) by object and event, but we will only look at line 1 of execution plans where the SQL_OPNAME indicates a DML statement. Thus we are looking exclusively at the overhead of modifying data, and not the overhead of finding it in the first place.
The current_obj# and current_file# is used to identify database segments and tablespace name. However, these columns are reliable only for wait events that relate to physical I/O. They are not, for example, valid for CPU events.
column sql_id format a13
column sql_plan_hash_value heading 'SQL Plan|Hash Value'
column sql_plan_line_id heading 'SQL Plan|Line ID'
column top_level_sql_id format a13 heading 'Top Level|SQL ID'
column event format a25
column object_type format a6 heading 'Object|Type'
column object_name format a11
column p1text format a9
column p2text format a6
column p3text format a7
column action format a11
column ash_secs heading 'ASH|secs' format 9999
column current_file# heading 'File|#' format 9999
column current_obj# heading 'Curr|Obj#' format 999999
compute sum of ash_secs on report
break on report
WITH x AS (
SELECT event, action, current_obj#, current_file#
,      p1text, p2text, p3text
,      SUM(1) ash_secs
FROM   v$active_session_history
WHERE  module = 'REDUNDANT'
AND    sample_time >= SYSDATE - (10+&&runtime)/86400
AND    sql_plan_line_id = 1 /*line 1 is where the update occurs*/
AND    sql_opname IN('INSERT','UPDATE','DELETE','UPSERT') /*DML Statement*/
GROUP BY event, action, current_obj#, current_file#, p1text, p2text, p3text
)
SELECT x.*
,      o.object_type, o.object_name
,      f.tablespace_name
FROM   x
LEFT OUTER JOIN dba_objects o 
 ON  o.object_id = x.current_obj# 
 AND event is not null
LEFT OUTER JOIN dba_data_files f 
 ON  f.file_id = x.current_file# 
 AND event is not null
AND x.p1text = 'file#'
AND x.p2text = 'block#'
ORDER BY ash_secs DESC 
/
The test ran for 314 seconds, though we only spent 210 seconds updating the table. However, we can see how much time was spent time of writing to various indexes. Note that T_BC is still maintained even though it is invisible.
                                         Curr  File                            ASH Object
EVENT                     ACTION         Obj#     # P1TEXT    P2TEXT P3TEXT   secs Type   OBJECT_NAME TABLESPACE_NAME
------------------------- ----------- ------- ----- --------- ------ ------- ----- ------ ----------- -----------------------
free buffer waits         UPDATE TEST      -1     0 file#     block# set-id#    44
                          UPDATE TEST      -1     0 file#     block# blocks     28
write complete waits      UPDATE TEST   99411     4 file#     block#            19 INDEX  T_BC        PSDEFAULT
free buffer waits         UPDATE TEST   99412     0 file#     block# set-id#    14 INDEX  T_BCD
write complete waits      UPDATE TEST   99414     4 file#     block#            14 INDEX  T_BCD_DESC  PSDEFAULT
free buffer waits         UPDATE TEST   99411     0 file#     block# set-id#    12 INDEX  T_BC
write complete waits      UPDATE TEST   99412     4 file#     block#            12 INDEX  T_BCD       PSDEFAULT
write complete waits      UPDATE TEST   99413     4 file#     block#            11 INDEX  T_BCD_ROUND PSDEFAULT
                          UPDATE TEST   99411     0 file#     block# blocks     11
                          UPDATE TEST   99413     0 file#     block# blocks      8
free buffer waits         UPDATE TEST   99413     0 file#     block# set-id#     7 INDEX  T_BCD_ROUND
free buffer waits         UPDATE TEST   99414     0 file#     block# set-id#     7 INDEX  T_BCD_DESC
                          UPDATE TEST   99414     0 file#     block# blocks      5
                          UPDATE TEST   99412     0 file#     block# blocks      5
free buffer waits         UPDATE TEST   99412     4 file#     block# set-id#     3 INDEX  T_BCD       PSDEFAULT
log file switch (private  UPDATE TEST   99411     4                              1 INDEX  T_BC
strand flush incomplete)
free buffer waits         UPDATE TEST   99411     4 file#     block# set-id#     1 INDEX  T_BC        PSDEFAULT
                          UPDATE TEST   99413     4 file#     block# blocks      1
                          UPDATE TEST      -1     0 file#     block#             1
                          UPDATE TEST   99411     4 file#     block# blocks      1
                          UPDATE TEST   99411     0 file#     block#             1
log file switch completio UPDATE TEST   99414     0                              1 INDEX  T_BCD_DESC
free buffer waits         UPDATE TEST      -1     4 file#     block# set-id#     1                    PSDEFAULT
                          UPDATE TEST      -1     4 file#     block# blocks      1
                          UPDATE TEST   99414     0 file#     block#             1
                                                                             -----
sum                                                                            210
Now I know where time was spent maintaining which indexes, and I can decide whether it is worth dropping a particular index.  In all, 33 seconds were spent maintaining index T_BC.
When I repeated the test, having dropped index T_BC, the runtime of the test goes down by 21 seconds to 293 seconds. The time spent maintaining other indexes also goes down. Time spent on index T_BCD went from 29 seconds to just 9 seconds. I speculate that this is due to better cache efficiency because it no longer has to cache index T_BC.
More time was spent on other wait events that are not directly associated with the index, so other forms of contention were introduced.  However, the performance of the test as a whole improved, so we are in a better place.
                                         Curr  File                            ASH Object
EVENT                     ACTION         Obj#     # P1TEXT    P2TEXT P3TEXT   secs Type   OBJECT_NAME TABLESPACE_NAME
------------------------- ----------- ------- ----- --------- ------ ------- ----- ------ ----------- ---------------
free buffer waits         UPDATE TEST      -1     0 file#     block# set-id#    41
                          UPDATE TEST      -1     0 file#     block# blocks     39
                          UPDATE TEST   99419     0 file#     block# blocks     12
                          UPDATE TEST   99418     0 file#     block# blocks     11
free buffer waits         UPDATE TEST   99411     0 file#     block# set-id#    10
write complete waits      UPDATE TEST   99413     4 file#     block#            10                    PSDEFAULT
                          UPDATE TEST   99411     0 file#     block# blocks      9
free buffer waits         UPDATE TEST   99412     0 file#     block# set-id#     9
write complete waits      UPDATE TEST   99414     4 file#     block#             8                    PSDEFAULT
write complete waits      UPDATE TEST   99411     4 file#     block#             8                    PSDEFAULT
write complete waits      UPDATE TEST   99418     4 file#     block#             6 INDEX  T_BCD       PSDEFAULT
write complete waits      UPDATE TEST   99412     4 file#     block#             5                    PSDEFAULT
                          UPDATE TEST   99414     0 file#     block# blocks      4
free buffer waits         UPDATE TEST   99414     0 file#     block# set-id#     4
free buffer waits         UPDATE TEST   99413     0 file#     block# set-id#     3
write complete waits      UPDATE TEST   99419     4 file#     block#             3 INDEX  T_BCD_ROUND PSDEFAULT
                          UPDATE TEST   99412     0 file#     block# blocks      3
                          UPDATE TEST   99413     0 file#     block# blocks      3
                          UPDATE TEST   99420     0 file#     block# blocks      2
free buffer waits         UPDATE TEST   99418     0 file#     block# set-id#     2 INDEX  T_BCD
write complete waits      UPDATE TEST   99420     4 file#     block#             1 INDEX  T_BCD_DESC  PSDEFAULT
                          UPDATE TEST   99411     4 file#     block# blocks      1
write complete waits      UPDATE TEST   99415     4 file#     block#             1 TABLE  T           PSDEFAULT
                          UPDATE TEST   99418     0 file#     block#             1
                          UPDATE TEST   99414     0 file#     block#             1
log file switch (private  UPDATE TEST   99418     0                              1 INDEX  T_BCD
strand flush incomplete)
                          UPDATE TEST   99411     0 file#     block#             1
                          UPDATE TEST   99418     0                              1
free buffer waits         UPDATE TEST   99412     4 file#     block# set-id#     1                    PSDEFAULT
free buffer waits         UPDATE TEST      -1     4 file#     block# set-id#     1                    PSDEFAULT
                          UPDATE TEST      -1     0 file#     block#             1
free buffer waits         UPDATE TEST   99420     0 file#     block# set-id#     1 INDEX  T_BCD_DESC
                                                                             -----
sum                                                                            204

Friday, January 06, 2017

Removing Unnecessary Indexes: 2. Identifying Redundant Indexes

This is the second post in a series about unnecessary indexes and some of the challenges that they present.
The EDB360 utility (see described on Carlos Sierra's Tools & Tips blog) contains a report of redundant indexes within a database. The query in this post (also available on my website) is based on the one in EDB360, but here the column list is produced with the LISTAGG analytic function.
 I have also extended it to handle function-based (and therefore descending) indexes. The detail of any function in the index is obtained from DBA_STAT_EXTENTIONS (where it is held in a CLOB column) rather than DBA_IND_EXPRESSIONS (where it is in a LONG column).
(update 18.1.2017) Carlos has added this query to EDB360 v1702 in Redundant Indexes (2)
REM redundantindex.sql
set lines 140 pages 99 trimspool on timi on
ttitle 'Redundant Indexes'
column table_owner      format a15 heading 'Table Owner'
column table_name       format a30 heading 'Table Name'
column index_name       format a20 heading 'Index Name'
column index_type       format a21 heading 'Index Type'
column superset_index   format a50 heading 'Superset Index' 
column redundant_index  format a40 heading 'Redundant Index'
column extension_name   format a30 heading 'Extension Name'
column visibility       heading 'Visi-|bility'
column constraint_type  format a4 heading 'Cons|Type' 
column constraint_name  format a30 heading 'Constraint Name'
spool redundantindex

WITH f AS ( /*function expressions*/
SELECT /*+MATERIALIZE*/ owner, table_name, extension, extension_name
FROM   dba_stat_extensions
where  creator = 'SYSTEM' /*exclude extended stats*/
), ic AS ( /*list indexed columns getting expressions from stat_extensions*/
SELECT /*+ MATERIALIZE*/ i.table_owner, i.table_name,
       i.owner index_owner, i.index_name,
       i.index_type, i.uniqueness, i.visibility,
       c.column_position,
       CASE WHEN f.extension IS NULL THEN c.column_name
            ELSE CAST(SUBSTR(REPLACE(SUBSTR(f.extension,2,LENGTH(f.extension)-2),'"',''),1,128) AS VARCHAR2(128))
       END column_name
  FROM dba_indexes i
     , dba_ind_columns c
       LEFT OUTER JOIN f
       ON f.owner = c.table_owner
       AND f.table_name = c.table_name
       AND f.extension_name = c.column_name
 WHERE c.table_owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
   AND c.table_owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF')
   AND C.table_name = 'T' 
   AND i.table_name = c.table_name
   AND i.owner = c.index_owner
   AND i.index_name = c.index_name
   AND i.index_type like '%NORMAL'
), i AS ( /*construct column list*/
SELECT /*+ MATERIALIZE*/
       ic.table_owner, ic.table_name,
       ic.index_owner, ic.index_name,
       ic.index_type, ic.uniqueness, ic.visibility,
       listagg(ic.column_name,',') within group (order by ic.column_position) AS column_list,
       '('||listagg('"'||ic.column_name||'"',',') within group (order by ic.column_position)||')' AS extension,
       count(*) num_columns
FROM ic
GROUP BY 
       ic.table_owner, ic.table_name,
       ic.index_owner, ic.index_name,
       ic.index_type, ic.uniqueness, ic.visibility
), e AS ( /*extended stats*/
SELECT /*+MATERIALIZE*/ owner, table_name, CAST(SUBSTR(extension,1,128) AS VARCHAR2(128)) extension, extension_name
FROM   dba_stat_extensions
where  creator = 'USER' /*extended stats not function based indexes*/
) 
SELECT r.table_owner, r.table_name,
       i.index_name||' ('||i.column_list||')' superset_index,
       r.index_name||' ('||r.column_list||')' redundant_index,
       c.constraint_type, c.constraint_name,
       r.index_type, r.visibility, e.extension_name
  FROM i r
       LEFT OUTER JOIN e
         ON  e.owner = r.table_owner
         AND e.table_name = r.table_name
         AND e.extension = r.extension
       LEFT OUTER JOIN dba_constraints c
         ON c.table_name = r.table_name
         AND c.index_owner = r.index_owner
         AND c.index_name = r.index_name
         AND c.owner = r.table_owner
         AND c.constraint_type IN('P','U')
     , i
 WHERE i.table_owner = r.table_owner
   AND i.table_name = r.table_name
   AND i.index_name != r.index_name
   AND i.column_list LIKE r.column_list||',%'
   AND i.num_columns > r.num_columns
 ORDER BY r.table_owner, r.table_name, r.index_name, i.index_name
/
spool off/
I added two more indexes to the example in the previous post, so I can demonstrate the script working with descending and other function-based indexes.
CREATE INDEX T_BCD_ROUND ON T (B, C, ROUND(D,0));
CREATE INDEX T_BCD_DESC  ON T (B, C, D DESC);
The report also shows whether the redundant index is already invisible, whether there are extended statistics on it, and whether there is a primary key or unique constraint than could be moved to the superset index..
Sat Jan 07                                                                                                                         page    1
                                                             Redundant Indexes

Table Owner     Table Name                     Superset Index                                     Redundant Index
--------------- ------------------------------ -------------------------------------------------- ----------------------------------------
Cons                                                      Visi-
Type Constraint Name                Index Type            bility    Extension Name
---- ------------------------------ --------------------- --------- ------------------------------
SYSADM          T                              T_BCD (B,C,D)                                      T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_BCD_DESC (B,C,D)                                 T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_BCD_ROUND (B,C,ROUND(D,0))                       T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_AB (A,B)                                         T_PK (A)
P    T_PK                           NORMAL                VISIBLE
…

Monday, January 02, 2017

Removing Unnecessary Indexes: 1. The Need for Extended Statistics

This is the first post in a series about unnecessary indexes and some of the challenges that they present 

I have always noted that it is remarkably easy to add indexes to systems, especially OLTP systems. However, removing them is generally more difficult.
The reward for removing an unused or at least an unnecessary index is that you no longer need to maintain it during DML operations, thus reducing I/O and not loading index blocks into the buffer cache. However, the risk is that performance will degrade somewhere because an execution plan changes.
I will start by looking at a class of indexes that are easily identified as candidates for removal. If the columns in a non-unique index are the same as the leading columns in another index then it is reasonable to remove it because any query that uses it could also use the other index. We cannot drop unique indexes because they are needed to enforce a unique or primary key constraint.
However, there is a catch. A multi-column index might be used by the optimizer to provide the number of distinct keys without being referenced in the execution plan. Here is a simple example.
CREATE TABLE t 
(a NUMBER
,b NUMBER
,c NUMBER
,d NUMBER
,CONSTRAINT t_pk PRIMARY KEY (a)
);

CREATE INDEX t_ab  ON t (a,b);
CREATE INDEX t_bc  ON t (b,c);
CREATE INDEX t_bcd ON t (b,c,d);
  • T_BC is a subset of T_BCD, so it is redundant and could be dropped.
  • (correction 7.1.2017) The primary key index T_PK is a subset of T_AB.  It is also redundant because the primary key constraint, on A alone, could be altered to use index T_AB.
ALTER TABLE t MODIFY PRIMARY KEY USING INDEX t_ab;
However, I am going to leave the default primary key index in place for the rest of this demonstration.  I will now populate the table and gather statistics but not histograms.
INSERT /*+APPEND*/ INTO t
WITH x AS (
SELECT  rownum-1 n FROM DUAL connect by level <= 1E5)
SELECT  n
, MOD(n,100)
, ROUND(MOD(n,100),-1)
, dbms_random.value(1,100)
FROM   x;
EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1');
The table has 100,000 rows, column B has 100 distinct, and column C has 11 distinct values. But there is a close correlation between them. The value of C is the value of B rounded off to the nearest 10.
column column_name format a20
SELECT column_name, num_distinct, num_buckets, num_nulls, histogram 
FROM user_tab_col_statistics
WHERE table_name = 'T'
ORDER BY 1
/

COLUMN_NAME         NUM_DISTINCT NUM_BUCKETS  NUM_NULLS HISTOGRAM                                            
-------------------- ------------ ----------- ---------- ---------------
A                         100000           1          0 NONE
B                            100           1          0 NONE
C                             11           1          0 NONE
D                         100000           1          0 NONE
We can also see that index T_BC has 100 distinct keys.
SELECT index_name, uniqueness, visibility, distinct_keys, clustering_factor
FROM user_indexes WHERE table_name = 'T'
/

INDEX_NAME UNIQUENES VISIBILIT DISTINCT_KEYS CLUSTERING_FACTOR
---------- --------- --------- ------------- -----------------
T_PK       UNIQUE    VISIBLE          100000               525
T_AB       NONUNIQUE VISIBLE          100000               525
T_BC       NONUNIQUE VISIBLE             100             52500
T_BCD      NONUNIQUE VISIBLE          100000             99894
Let's look at the execution plans for some simple queries: This first query uses index T_BC because. It doesn't even need to look up the table because column A is not null.
SELECT COUNT(A) FROM t WHERE b = 42 AND c = 40          
    
Plan hash value: 2931606879
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       5 |
|   1 |  SORT AGGREGATE   |      |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |       5 |
|*  2 |   INDEX RANGE SCAN| T_BC |      1 |   1000 |  6000 |     3   (0)| 00:00:01 |   1000 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------------------------------------
This query references column D, which is nullable, so it uses index T_BCD.
SELECT COUNT(D) FROM t WHERE b = 42 AND c = 40

Plan hash value: 2099617883
--------------------------------------------------------------------------------------------------------------------- 
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT  |       |      1 |        |       |     8 (100)|          |      1 |00:00:00.01 |       8 | 
|   1 |  SORT AGGREGATE   |       |      1 |      1 |    28 |            |          |      1 |00:00:00.01 |       8 | 
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |   1000 | 28000 |     8   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |
---------------------------------------------------------------------------------------------------------------------
Note that in both cases the optimizer estimated it would receive 1000 rows, and it did.

Invisible Indexes 

Rather than drop index T_BC, I am going to make it invisible and repeat the test. I would always do this in a real production situation before dropping the index later. If removing the index does have undesirable consequences it can instantly be made visible again, whereas rebuilding it may take time and cause contention on a large and active table.
ALTER INDEX t_bc INVISIBLE;
The execution plan for the first query has changed to use T_BCD (which is perfectly reasonable), but the estimated number of rows is now just 91, not the 1000 that were actually found. Although it didn't make a difference to this execution plan, it is this misestimate which could cause plan regressions in more complex cases.
SELECT COUNT(A) FROM t WHERE b = 42 AND c = 40          

Plan hash value: 2099617883                                   
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE   |       |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |     91 |   546 |     3   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |
---------------------------------------------------------------------------------------------------------------------
The other query produces the same plan, with the same misestimate of the number of rows.
SELECT COUNT(D) FROM t WHERE b = 42 AND c = 40          
    
Plan hash value: 2099617883                                   
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |       |     3 (100)|          |      1 |00:00:00.02 |       8 |
|   1 |  SORT AGGREGATE   |       |      1 |      1 |    28 |            |          |      1 |00:00:00.02 |       8 |
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |     91 |  2548 |     3   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |
---------------------------------------------------------------------------------------------------------------------
The optimizer misestimates of the number of rows because it can no longer use index T_BC to tell it that there are only 100 distinct values on the table for the combination of columns B and C.

Extended Statistics

Now I will create extended statistics on the table
select dbms_stats.create_extended_stats(null,'T','(B,C)') from dual;
EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1');
Or I could just have done this. It comes to the same.
EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 1 (B,C)');
The extended statistics on the combination of columns B and C tells the optimizer that there are 100 distinct values for these columns. Note that the extended statistics only have a single bucket.
COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS  NUM_NULLS HISTOGRAM
------------------------------ ------------ ----------- ---------- -----------
A                                    100000           1          0 NONE
B                                       100           1          0 NONE
C                                        11           1          0 NONE
D                                    100000           1          0 NONE
SYS_STU3$58HONF9VK$$69P2OW4P4X          100           1          0 NONE

SELECT * FROM user_stat_extensions WHERE table_name = 'T'

TABLE_NAME EXTENSION_NAME                 EXTENSION            CREATO DRO
---------- ------------------------------ -------------------- ------ ---
T          SYS_STU3$58HONF9VK$$69P2OW4P4X ("B","C")            USER   YES
Now the optimizer reverts to correctly estimating the number of rows as 1000.
SELECT COUNT(A) FROM t WHERE b = 42 AND c = 40                                                                    
                                                                                                                        
Plan hash value: 2099617883                                                                                             
---------------------------------------------------------------------------------------------------------------------   
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |   
---------------------------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |       |      1 |        |       |     8 (100)|          |      1 |00:00:00.01 |       8 |   
|   1 |  SORT AGGREGATE   |       |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |       8 |   
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |   1000 |  6000 |     8   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |   
---------------------------------------------------------------------------------------------------------------------   

SELECT COUNT(D) FROM t WHERE b = 42 AND c = 40                                                                    
                                                                                                                        
Plan hash value: 2099617883 
---------------------------------------------------------------------------------------------------------------------   
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |   
---------------------------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |       |      1 |        |       |     8 (100)|          |      1 |00:00:00.02 |       8 |   
|   1 |  SORT AGGREGATE   |       |      1 |      1 |    28 |            |          |      1 |00:00:00.02 |       8 |   
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |   1000 | 28000 |     8   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |   
---------------------------------------------------------------------------------------------------------------------   

Conclusion

  • If you are going to drop a redundant multi-column index, at least replace it with extended statistics on that combination of columns.
  • If you are going to add columns to an existing index rather than create another, perhaps to satisfy a query without the need to visit the table, then again consider creating extended statistics on the original list of columns.
Acknowledgement: my thanks to Jonathan Lewis for his blog: Index Usage, and for prompting the correction.