The /*+Go-Faster*/ Oracle Blog: January 2017

Tuesday, January 10, 2017

Removing Unnecessary Indexes: 3.Cost of Maintaining Indexes

This is the third post in a series about unnecessary indexes and some of the challenges that they present.
In this post I want to look at how to quantify the overhead of index maintenance, and estimate the benefit of removing redundant indexes.
Let’s conduct an experiment. I will set up the same table and indexes as in the previous posts in this series, but this time I will put them into the RECYCLE buffer pool, to which I have allocated the minimum possible 4M of memory, and I will allocate 25% free space to the indexes. The idea is not to benefit from the buffer cache, but suffer the pain of the physical I/O.

CREATE TABLE t (a NUMBER, b NUMBER, c NUMBER, d NUMBER
,CONSTRAINT t_pk PRIMARY KEY (a) USING INDEX STORAGE (BUFFER_POOL RECYCLE)
) STORAGE(BUFFER_POOL RECYCLE);

CREATE INDEX t_ab  ON t (a,b)   STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;
CREATE INDEX t_bc  ON t (b,c)   STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;
CREATE INDEX t_bcd ON t (b,c,d) STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;

INSERT /*+APPEND*/ INTO t
WITH x AS (
SELECT  rownum-1 n FROM DUAL connect by level <= 1E5)
SELECT  n
, MOD(n,100)
, ROUND(MOD(n,100),-1)
, dbms_random.value(1,100)
FROM   x
/
CREATE INDEX T_BCD_ROUND ON T (B, C, ROUND(D,0)) STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;
CREATE INDEX T_BCD_DESC  ON T (B, C, D DESC)     STORAGE(BUFFER_POOL RECYCLE) PCTFREE 25;

EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 1 (B,C)');

The table and indexes really too large to fit in the recycle buffer pool, so there will be physical read as blocks are loaded into the buffer cache, and physical write as dirty blocks are pushed out.

SELECT s.owner, s.segment_name
,      s.blocks, s.bytes/1024/1024 Mb
,      t.blocks
FROM   dba_segments s
,      dba_tables t
WHERE  t.table_name = 'T'
AND    s.segment_type = 'TABLE'
AND    s.segment_name = t.table_name
/

OWNER      SEGMENT_NAME     BLOCKS         MB     BLOCKS
---------- ------------ ---------- ---------- ----------
SYSADM     T                   640          5        543

SELECT  s.owner, s.segment_name
,       s.blocks, s.bytes/1024/1024 Mb
,       i.leaf_blocks
FROM    dba_segments s
,       dba_indexes i
WHERE   s.owner = i.owner
AND     s.segment_name = i.index_name
AND     i.table_name = 'T'
/

OWNER      SEGMENT_NAME     BLOCKS         MB LEAF_BLOCKS
---------- ------------ ---------- ---------- -----------
SYSADM     T_PK                256          2         187
SYSADM     T_AB                256          2         237
SYSADM     T_BC                256          2         213
SYSADM     T_BCD               512          4         488
SYSADM     T_BCD_ROUND         384          3         336
SYSADM     T_BCD_DESC          768          6         675

Now I will run a PL/SQL loop that randomly updates rows in the table with random values, 500,000 times. The random nature of the update will minimize the benefit of the caching. The database will continuously have to load blocks from disk back into buffer cache.

DECLARE 
  l_counter INTEGER := 0;
  l_t1 DATE;
  l_r1 INTEGER;
  l_r2 INTEGER;
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
BEGIN
  dbms_application_info.read_module(l_module, l_action);
  dbms_application_info.set_module('REDUNDANT','UPDATE TEST');
  l_t1 := SYSDATE + &&runtime/86400;
  WHILE l_t1 >= SYSDATE AND l_counter < 5e5 LOOP
    l_r1 := round(dbms_random.value(1,100),0);
    l_r2 := round(dbms_random.value(1,100),0);
    UPDATE t
    SET    c = l_r2
    ,      d = ROUND(l_r2,-1)
    WHERE  a = l_r1;
    l_counter := l_counter + 1;
    COMMIT;
  END LOOP; 
  dbms_output.put_line(l_counter||' rows updated');
  dbms_application_info.set_module(l_module, l_action);
END;
/

The activity in the buffer cache is visible via v$buffer_pool_statistics.

select * from v$buffer_pool_statistics where name = 'RECYCLE';

Before:

        ID NAME                 BLOCK_SIZE  SET_MSIZE  CNUM_REPL CNUM_WRITE   CNUM_SET    BUF_GOT  SUM_WRITE   SUM_SCAN
---------- -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
FREE_BUFFER_WAIT WRITE_COMPLETE_WAIT BUFFER_BUSY_WAIT FREE_BUFFER_INSPECTED DIRTY_BUFFERS_INSPECTED DB_BLOCK_CHANGE
---------------- ------------------- ---------------- --------------------- ----------------------- ---------------
DB_BLOCK_GETS CONSISTENT_GETS PHYSICAL_READS PHYSICAL_WRITES
------------- --------------- -------------- ---------------
         2 RECYCLE                    8192        490        490          0        490     482786     758695          0
           65466               22517                0                889060                  612270        49971577
     71349961        18260691         385878          758695

After:

         2 RECYCLE                    8192        490        459         31        490     518646     853441          0
           73457               25942                0                981089                  688777        57003938
     81763622        19260981         413463          853441

We can see there have been physical reads and writes on objects in the recycle pool.
This is the execution plan of the statement in the PL/SQL loop. The update occurs in line 1 of the plan.

SQL_ID  3f305xbd6ts1d, child number 1
-------------------------------------
UPDATE T SET C = :B2 , D = ROUND(:B2 ,-1) WHERE A = :B1

Plan hash value: 795017363
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  UPDATE            | T    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| T_PK |     1 |    33 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Profile DML Overhead

Now, I will profile the recent ASH data for my test (filtered by module and action) by object and event, but we will only look at line 1 of execution plans where the SQL_OPNAME indicates a DML statement. Thus we are looking exclusively at the overhead of modifying data, and not the overhead of finding it in the first place.
The current_obj# and current_file# is used to identify database segments and tablespace name. However, these columns are reliable only for wait events that relate to physical I/O. They are not, for example, valid for CPU events.

column sql_id format a13
column sql_plan_hash_value heading 'SQL Plan|Hash Value'
column sql_plan_line_id heading 'SQL Plan|Line ID'
column top_level_sql_id format a13 heading 'Top Level|SQL ID'
column event format a25
column object_type format a6 heading 'Object|Type'
column object_name format a11
column p1text format a9
column p2text format a6
column p3text format a7
column action format a11
column ash_secs heading 'ASH|secs' format 9999
column current_file# heading 'File|#' format 9999
column current_obj# heading 'Curr|Obj#' format 999999
compute sum of ash_secs on report
break on report
WITH x AS (
SELECT event, action, current_obj#, current_file#
,      p1text, p2text, p3text
,      SUM(1) ash_secs
FROM   v$active_session_history
WHERE  module = 'REDUNDANT'
AND    sample_time >= SYSDATE - (10+&&runtime)/86400
AND    sql_plan_line_id = 1 /*line 1 is where the update occurs*/
AND    sql_opname IN('INSERT','UPDATE','DELETE','UPSERT') /*DML Statement*/
GROUP BY event, action, current_obj#, current_file#, p1text, p2text, p3text
)
SELECT x.*
,      o.object_type, o.object_name
,      f.tablespace_name
FROM   x
LEFT OUTER JOIN dba_objects o 
 ON  o.object_id = x.current_obj# 
 AND event is not null
LEFT OUTER JOIN dba_data_files f 
 ON  f.file_id = x.current_file# 
 AND event is not null
AND x.p1text = 'file#'
AND x.p2text = 'block#'
ORDER BY ash_secs DESC 
/

The test ran for 314 seconds, though we only spent 210 seconds updating the table. However, we can see how much time was spent time of writing to various indexes. Note that T_BC is still maintained even though it is invisible.

                                         Curr  File                            ASH Object
EVENT                     ACTION         Obj#     # P1TEXT    P2TEXT P3TEXT   secs Type   OBJECT_NAME TABLESPACE_NAME
------------------------- ----------- ------- ----- --------- ------ ------- ----- ------ ----------- -----------------------
free buffer waits         UPDATE TEST      -1     0 file#     block# set-id#    44
                          UPDATE TEST      -1     0 file#     block# blocks     28
write complete waits      UPDATE TEST   99411     4 file#     block#            19 INDEX  T_BC        PSDEFAULT
free buffer waits         UPDATE TEST   99412     0 file#     block# set-id#    14 INDEX  T_BCD
write complete waits      UPDATE TEST   99414     4 file#     block#            14 INDEX  T_BCD_DESC  PSDEFAULT
free buffer waits         UPDATE TEST   99411     0 file#     block# set-id#    12 INDEX  T_BC
write complete waits      UPDATE TEST   99412     4 file#     block#            12 INDEX  T_BCD       PSDEFAULT
write complete waits      UPDATE TEST   99413     4 file#     block#            11 INDEX  T_BCD_ROUND PSDEFAULT
                          UPDATE TEST   99411     0 file#     block# blocks     11
                          UPDATE TEST   99413     0 file#     block# blocks      8
free buffer waits         UPDATE TEST   99413     0 file#     block# set-id#     7 INDEX  T_BCD_ROUND
free buffer waits         UPDATE TEST   99414     0 file#     block# set-id#     7 INDEX  T_BCD_DESC
                          UPDATE TEST   99414     0 file#     block# blocks      5
                          UPDATE TEST   99412     0 file#     block# blocks      5
free buffer waits         UPDATE TEST   99412     4 file#     block# set-id#     3 INDEX  T_BCD       PSDEFAULT
log file switch (private  UPDATE TEST   99411     4                              1 INDEX  T_BC
strand flush incomplete)
free buffer waits         UPDATE TEST   99411     4 file#     block# set-id#     1 INDEX  T_BC        PSDEFAULT
                          UPDATE TEST   99413     4 file#     block# blocks      1
                          UPDATE TEST      -1     0 file#     block#             1
                          UPDATE TEST   99411     4 file#     block# blocks      1
                          UPDATE TEST   99411     0 file#     block#             1
log file switch completio UPDATE TEST   99414     0                              1 INDEX  T_BCD_DESC
free buffer waits         UPDATE TEST      -1     4 file#     block# set-id#     1                    PSDEFAULT
                          UPDATE TEST      -1     4 file#     block# blocks      1
                          UPDATE TEST   99414     0 file#     block#             1
                                                                             -----
sum                                                                            210

Now I know where time was spent maintaining which indexes, and I can decide whether it is worth dropping a particular index. In all, 33 seconds were spent maintaining index T_BC.
When I repeated the test, having dropped index T_BC, the runtime of the test goes down by 21 seconds to 293 seconds. The time spent maintaining other indexes also goes down. Time spent on index T_BCD went from 29 seconds to just 9 seconds. I speculate that this is due to better cache efficiency because it no longer has to cache index T_BC.
More time was spent on other wait events that are not directly associated with the index, so other forms of contention were introduced. However, the performance of the test as a whole improved, so we are in a better place.

                                         Curr  File                            ASH Object
EVENT                     ACTION         Obj#     # P1TEXT    P2TEXT P3TEXT   secs Type   OBJECT_NAME TABLESPACE_NAME
------------------------- ----------- ------- ----- --------- ------ ------- ----- ------ ----------- ---------------
free buffer waits         UPDATE TEST      -1     0 file#     block# set-id#    41
                          UPDATE TEST      -1     0 file#     block# blocks     39
                          UPDATE TEST   99419     0 file#     block# blocks     12
                          UPDATE TEST   99418     0 file#     block# blocks     11
free buffer waits         UPDATE TEST   99411     0 file#     block# set-id#    10
write complete waits      UPDATE TEST   99413     4 file#     block#            10                    PSDEFAULT
                          UPDATE TEST   99411     0 file#     block# blocks      9
free buffer waits         UPDATE TEST   99412     0 file#     block# set-id#     9
write complete waits      UPDATE TEST   99414     4 file#     block#             8                    PSDEFAULT
write complete waits      UPDATE TEST   99411     4 file#     block#             8                    PSDEFAULT
write complete waits      UPDATE TEST   99418     4 file#     block#             6 INDEX  T_BCD       PSDEFAULT
write complete waits      UPDATE TEST   99412     4 file#     block#             5                    PSDEFAULT
                          UPDATE TEST   99414     0 file#     block# blocks      4
free buffer waits         UPDATE TEST   99414     0 file#     block# set-id#     4
free buffer waits         UPDATE TEST   99413     0 file#     block# set-id#     3
write complete waits      UPDATE TEST   99419     4 file#     block#             3 INDEX  T_BCD_ROUND PSDEFAULT
                          UPDATE TEST   99412     0 file#     block# blocks      3
                          UPDATE TEST   99413     0 file#     block# blocks      3
                          UPDATE TEST   99420     0 file#     block# blocks      2
free buffer waits         UPDATE TEST   99418     0 file#     block# set-id#     2 INDEX  T_BCD
write complete waits      UPDATE TEST   99420     4 file#     block#             1 INDEX  T_BCD_DESC  PSDEFAULT
                          UPDATE TEST   99411     4 file#     block# blocks      1
write complete waits      UPDATE TEST   99415     4 file#     block#             1 TABLE  T           PSDEFAULT
                          UPDATE TEST   99418     0 file#     block#             1
                          UPDATE TEST   99414     0 file#     block#             1
log file switch (private  UPDATE TEST   99418     0                              1 INDEX  T_BCD
strand flush incomplete)
                          UPDATE TEST   99411     0 file#     block#             1
                          UPDATE TEST   99418     0                              1
free buffer waits         UPDATE TEST   99412     4 file#     block# set-id#     1                    PSDEFAULT
free buffer waits         UPDATE TEST      -1     4 file#     block# set-id#     1                    PSDEFAULT
                          UPDATE TEST      -1     0 file#     block#             1
free buffer waits         UPDATE TEST   99420     0 file#     block# set-id#     1 INDEX  T_BCD_DESC
                                                                             -----
sum                                                                            204

Friday, January 06, 2017

Removing Unnecessary Indexes: 2. Identifying Redundant Indexes

This is the second post in a series about unnecessary indexes and some of the challenges that they present.
The EDB360 utility (see described on Carlos Sierra's Tools & Tips blog) contains a report of redundant indexes within a database. The query in this post (also available on my website) is based on the one in EDB360, but here the column list is produced with the LISTAGG analytic function.
I have also extended it to handle function-based (and therefore descending) indexes. The detail of any function in the index is obtained from DBA_STAT_EXTENTIONS (where it is held in a CLOB column) rather than DBA_IND_EXPRESSIONS (where it is in a LONG column).
(update 18.1.2017) Carlos has added this query to EDB360 v1702 in Redundant Indexes (2)

REM redundantindex.sql
set lines 140 pages 99 trimspool on timi on
ttitle 'Redundant Indexes'
column table_owner      format a15 heading 'Table Owner'
column table_name       format a30 heading 'Table Name'
column index_name       format a20 heading 'Index Name'
column index_type       format a21 heading 'Index Type'
column superset_index   format a50 heading 'Superset Index' 
column redundant_index  format a40 heading 'Redundant Index'
column extension_name   format a30 heading 'Extension Name'
column visibility       heading 'Visi-|bility'
column constraint_type  format a4 heading 'Cons|Type' 
column constraint_name  format a30 heading 'Constraint Name'
spool redundantindex

WITH f AS ( /*function expressions*/
SELECT /*+MATERIALIZE*/ owner, table_name, extension, extension_name
FROM   dba_stat_extensions
where  creator = 'SYSTEM' /*exclude extended stats*/
), ic AS ( /*list indexed columns getting expressions from stat_extensions*/
SELECT /*+ MATERIALIZE*/ i.table_owner, i.table_name,
       i.owner index_owner, i.index_name,
       i.index_type, i.uniqueness, i.visibility,
       c.column_position,
       CASE WHEN f.extension IS NULL THEN c.column_name
            ELSE CAST(SUBSTR(REPLACE(SUBSTR(f.extension,2,LENGTH(f.extension)-2),'"',''),1,128) AS VARCHAR2(128))
       END column_name
  FROM dba_indexes i
     , dba_ind_columns c
       LEFT OUTER JOIN f
       ON f.owner = c.table_owner
       AND f.table_name = c.table_name
       AND f.extension_name = c.column_name
 WHERE c.table_owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
   AND c.table_owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF')
   AND C.table_name = 'T' 
   AND i.table_name = c.table_name
   AND i.owner = c.index_owner
   AND i.index_name = c.index_name
   AND i.index_type like '%NORMAL'
), i AS ( /*construct column list*/
SELECT /*+ MATERIALIZE*/
       ic.table_owner, ic.table_name,
       ic.index_owner, ic.index_name,
       ic.index_type, ic.uniqueness, ic.visibility,
       listagg(ic.column_name,',') within group (order by ic.column_position) AS column_list,
       '('||listagg('"'||ic.column_name||'"',',') within group (order by ic.column_position)||')' AS extension,
       count(*) num_columns
FROM ic
GROUP BY 
       ic.table_owner, ic.table_name,
       ic.index_owner, ic.index_name,
       ic.index_type, ic.uniqueness, ic.visibility
), e AS ( /*extended stats*/
SELECT /*+MATERIALIZE*/ owner, table_name, CAST(SUBSTR(extension,1,128) AS VARCHAR2(128)) extension, extension_name
FROM   dba_stat_extensions
where  creator = 'USER' /*extended stats not function based indexes*/
) 
SELECT r.table_owner, r.table_name,
       i.index_name||' ('||i.column_list||')' superset_index,
       r.index_name||' ('||r.column_list||')' redundant_index,
       c.constraint_type, c.constraint_name,
       r.index_type, r.visibility, e.extension_name
  FROM i r
       LEFT OUTER JOIN e
         ON  e.owner = r.table_owner
         AND e.table_name = r.table_name
         AND e.extension = r.extension
       LEFT OUTER JOIN dba_constraints c
         ON c.table_name = r.table_name
         AND c.index_owner = r.index_owner
         AND c.index_name = r.index_name
         AND c.owner = r.table_owner
         AND c.constraint_type IN('P','U')
     , i
 WHERE i.table_owner = r.table_owner
   AND i.table_name = r.table_name
   AND i.index_name != r.index_name
   AND i.column_list LIKE r.column_list||',%'
   AND i.num_columns > r.num_columns
 ORDER BY r.table_owner, r.table_name, r.index_name, i.index_name
/
spool off/

I added two more indexes to the example in the previous post, so I can demonstrate the script working with descending and other function-based indexes.

CREATE INDEX T_BCD_ROUND ON T (B, C, ROUND(D,0));
CREATE INDEX T_BCD_DESC  ON T (B, C, D DESC);

The report also shows whether the redundant index is already invisible, whether there are extended statistics on it, and whether there is a primary key or unique constraint than could be moved to the superset index..

Sat Jan 07                                                                                                                         page    1
                                                             Redundant Indexes

Table Owner     Table Name                     Superset Index                                     Redundant Index
--------------- ------------------------------ -------------------------------------------------- ----------------------------------------
Cons                                                      Visi-
Type Constraint Name                Index Type            bility    Extension Name
---- ------------------------------ --------------------- --------- ------------------------------
SYSADM          T                              T_BCD (B,C,D)                                      T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_BCD_DESC (B,C,D)                                 T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_BCD_ROUND (B,C,ROUND(D,0))                       T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_AB (A,B)                                         T_PK (A)
P    T_PK                           NORMAL                VISIBLE
…

Monday, January 02, 2017

Removing Unnecessary Indexes: 1. The Need for Extended Statistics

This is the first post in a series about unnecessary indexes and some of the challenges that they present

I have always noted that it is remarkably easy to add indexes to systems, especially OLTP systems. However, removing them is generally more difficult.
The reward for removing an unused or at least an unnecessary index is that you no longer need to maintain it during DML operations, thus reducing I/O and not loading index blocks into the buffer cache. However, the risk is that performance will degrade somewhere because an execution plan changes.
I will start by looking at a class of indexes that are easily identified as candidates for removal. If the columns in a non-unique index are the same as the leading columns in another index then it is reasonable to remove it because any query that uses it could also use the other index. We cannot drop unique indexes because they are needed to enforce a unique or primary key constraint.
However, there is a catch. A multi-column index might be used by the optimizer to provide the number of distinct keys without being referenced in the execution plan. Here is a simple example.

CREATE TABLE t 
(a NUMBER
,b NUMBER
,c NUMBER
,d NUMBER
,CONSTRAINT t_pk PRIMARY KEY (a)
);

CREATE INDEX t_ab  ON t (a,b);
CREATE INDEX t_bc  ON t (b,c);
CREATE INDEX t_bcd ON t (b,c,d);

T_BC is a subset of T_BCD, so it is redundant and could be dropped.
(correction 7.1.2017) The primary key index T_PK is a subset of T_AB. It is also redundant because the primary key constraint, on A alone, could be altered to use index T_AB.

ALTER TABLE t MODIFY PRIMARY KEY USING INDEX t_ab;

However, I am going to leave the default primary key index in place for the rest of this demonstration. I will now populate the table and gather statistics but not histograms.

INSERT /*+APPEND*/ INTO t
WITH x AS (
SELECT  rownum-1 n FROM DUAL connect by level <= 1E5)
SELECT  n
, MOD(n,100)
, ROUND(MOD(n,100),-1)
, dbms_random.value(1,100)
FROM   x;
EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1');

The table has 100,000 rows, column B has 100 distinct, and column C has 11 distinct values. But there is a close correlation between them. The value of C is the value of B rounded off to the nearest 10.

column column_name format a20
SELECT column_name, num_distinct, num_buckets, num_nulls, histogram 
FROM user_tab_col_statistics
WHERE table_name = 'T'
ORDER BY 1
/

COLUMN_NAME         NUM_DISTINCT NUM_BUCKETS  NUM_NULLS HISTOGRAM                                            
-------------------- ------------ ----------- ---------- ---------------
A                         100000           1          0 NONE
B                            100           1          0 NONE
C                             11           1          0 NONE
D                         100000           1          0 NONE

We can also see that index T_BC has 100 distinct keys.

SELECT index_name, uniqueness, visibility, distinct_keys, clustering_factor
FROM user_indexes WHERE table_name = 'T'
/

INDEX_NAME UNIQUENES VISIBILIT DISTINCT_KEYS CLUSTERING_FACTOR
---------- --------- --------- ------------- -----------------
T_PK       UNIQUE    VISIBLE          100000               525
T_AB       NONUNIQUE VISIBLE          100000               525
T_BC       NONUNIQUE VISIBLE             100             52500
T_BCD      NONUNIQUE VISIBLE          100000             99894

Let's look at the execution plans for some simple queries: This first query uses index T_BC because. It doesn't even need to look up the table because column A is not null.

SELECT COUNT(A) FROM t WHERE b = 42 AND c = 40          
    
Plan hash value: 2931606879
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       5 |
|   1 |  SORT AGGREGATE   |      |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |       5 |
|*  2 |   INDEX RANGE SCAN| T_BC |      1 |   1000 |  6000 |     3   (0)| 00:00:01 |   1000 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------------------------------------

This query references column D, which is nullable, so it uses index T_BCD.

SELECT COUNT(D) FROM t WHERE b = 42 AND c = 40

Plan hash value: 2099617883
--------------------------------------------------------------------------------------------------------------------- 
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT  |       |      1 |        |       |     8 (100)|          |      1 |00:00:00.01 |       8 | 
|   1 |  SORT AGGREGATE   |       |      1 |      1 |    28 |            |          |      1 |00:00:00.01 |       8 | 
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |   1000 | 28000 |     8   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |
---------------------------------------------------------------------------------------------------------------------

Note that in both cases the optimizer estimated it would receive 1000 rows, and it did.

Invisible Indexes

Rather than drop index T_BC, I am going to make it invisible and repeat the test. I would always do this in a real production situation before dropping the index later. If removing the index does have undesirable consequences it can instantly be made visible again, whereas rebuilding it may take time and cause contention on a large and active table.

ALTER INDEX t_bc INVISIBLE;

The execution plan for the first query has changed to use T_BCD (which is perfectly reasonable), but the estimated number of rows is now just 91, not the 1000 that were actually found. Although it didn't make a difference to this execution plan, it is this misestimate which could cause plan regressions in more complex cases.

SELECT COUNT(A) FROM t WHERE b = 42 AND c = 40          

Plan hash value: 2099617883                                   
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE   |       |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |     91 |   546 |     3   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |
---------------------------------------------------------------------------------------------------------------------

The other query produces the same plan, with the same misestimate of the number of rows.

SELECT COUNT(D) FROM t WHERE b = 42 AND c = 40          
    
Plan hash value: 2099617883                                   
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |       |     3 (100)|          |      1 |00:00:00.02 |       8 |
|   1 |  SORT AGGREGATE   |       |      1 |      1 |    28 |            |          |      1 |00:00:00.02 |       8 |
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |     91 |  2548 |     3   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |
---------------------------------------------------------------------------------------------------------------------

The optimizer misestimates of the number of rows because it can no longer use index T_BC to tell it that there are only 100 distinct values on the table for the combination of columns B and C.

Extended Statistics

Now I will create extended statistics on the table

select dbms_stats.create_extended_stats(null,'T','(B,C)') from dual;
EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1');

Or I could just have done this. It comes to the same.

EXEC dbms_stats.gather_table_stats(null,'T',cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 1 (B,C)');

The extended statistics on the combination of columns B and C tells the optimizer that there are 100 distinct values for these columns. Note that the extended statistics only have a single bucket.

COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS  NUM_NULLS HISTOGRAM
------------------------------ ------------ ----------- ---------- -----------
A                                    100000           1          0 NONE
B                                       100           1          0 NONE
C                                        11           1          0 NONE
D                                    100000           1          0 NONE
SYS_STU3$58HONF9VK$$69P2OW4P4X          100           1          0 NONE

SELECT * FROM user_stat_extensions WHERE table_name = 'T'

TABLE_NAME EXTENSION_NAME                 EXTENSION            CREATO DRO
---------- ------------------------------ -------------------- ------ ---
T          SYS_STU3$58HONF9VK$$69P2OW4P4X ("B","C")            USER   YES

Now the optimizer reverts to correctly estimating the number of rows as 1000.

SELECT COUNT(A) FROM t WHERE b = 42 AND c = 40                                                                    
                                                                                                                        
Plan hash value: 2099617883                                                                                             
---------------------------------------------------------------------------------------------------------------------   
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |   
---------------------------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |       |      1 |        |       |     8 (100)|          |      1 |00:00:00.01 |       8 |   
|   1 |  SORT AGGREGATE   |       |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |       8 |   
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |   1000 |  6000 |     8   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |   
---------------------------------------------------------------------------------------------------------------------

SELECT COUNT(D) FROM t WHERE b = 42 AND c = 40                                                                    
                                                                                                                        
Plan hash value: 2099617883 
---------------------------------------------------------------------------------------------------------------------   
| Id  | Operation         | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |   
---------------------------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |       |      1 |        |       |     8 (100)|          |      1 |00:00:00.02 |       8 |   
|   1 |  SORT AGGREGATE   |       |      1 |      1 |    28 |            |          |      1 |00:00:00.02 |       8 |   
|*  2 |   INDEX RANGE SCAN| T_BCD |      1 |   1000 | 28000 |     8   (0)| 00:00:01 |   1000 |00:00:00.01 |       8 |   
---------------------------------------------------------------------------------------------------------------------

Conclusion

If you are going to drop a redundant multi-column index, at least replace it with extended statistics on that combination of columns.
If you are going to add columns to an existing index rather than create another, perhaps to satisfy a query without the need to visit the table, then again consider creating extended statistics on the original list of columns.

Acknowledgement: my thanks to Jonathan Lewis for his blog: Index Usage, and for prompting the correction.

Subscribe to: Posts ( Atom )

The /+Go-Faster/ Oracle Blog

Tuesday, January 10, 2017

Removing Unnecessary Indexes: 3.Cost of Maintaining Indexes

Profile DML Overhead

Friday, January 06, 2017

Removing Unnecessary Indexes: 2. Identifying Redundant Indexes

Monday, January 02, 2017

Removing Unnecessary Indexes: 1. The Need for Extended Statistics

Invisible Indexes

Extended Statistics

Conclusion

Announcements

My Book

Oracle ACE Director

Twitter

Blog Archive

Tags

Other Oracle Blogs

Other PeopleSoft Blogs

OakTable Network

Oracle ACE Director

©David Kurtz