Friday, January 06, 2017

Removing Unnecessary Indexes: 2. Identifying Redundant Indexes

This is the second post in a series about unnecessary indexes and some of the challenges that they present.
The EDB360 utility (see described on Carlos Sierra's Tools & Tips blog) contains a report of redundant indexes within a database. The query in this post (also available on my website) is based on the one in EDB360, but here the column list is produced with the LISTAGG analytic function.
 I have also extended it to handle function-based (and therefore descending) indexes. The detail of any function in the index is obtained from DBA_STAT_EXTENTIONS (where it is held in a CLOB column) rather than DBA_IND_EXPRESSIONS (where it is in a LONG column).
(update 18.1.2017) Carlos has added this query to EDB360 v1702 in Redundant Indexes (2)
REM redundantindex.sql
set lines 140 pages 99 trimspool on timi on
ttitle 'Redundant Indexes'
column table_owner      format a15 heading 'Table Owner'
column table_name       format a30 heading 'Table Name'
column index_name       format a20 heading 'Index Name'
column index_type       format a21 heading 'Index Type'
column superset_index   format a50 heading 'Superset Index' 
column redundant_index  format a40 heading 'Redundant Index'
column extension_name   format a30 heading 'Extension Name'
column visibility       heading 'Visi-|bility'
column constraint_type  format a4 heading 'Cons|Type' 
column constraint_name  format a30 heading 'Constraint Name'
spool redundantindex

WITH f AS ( /*function expressions*/
SELECT /*+MATERIALIZE*/ owner, table_name, extension, extension_name
FROM   dba_stat_extensions
where  creator = 'SYSTEM' /*exclude extended stats*/
), ic AS ( /*list indexed columns getting expressions from stat_extensions*/
SELECT /*+ MATERIALIZE*/ i.table_owner, i.table_name,
       i.owner index_owner, i.index_name,
       i.index_type, i.uniqueness, i.visibility,
       c.column_position,
       CASE WHEN f.extension IS NULL THEN c.column_name
            ELSE CAST(SUBSTR(REPLACE(SUBSTR(f.extension,2,LENGTH(f.extension)-2),'"',''),1,128) AS VARCHAR2(128))
       END column_name
  FROM dba_indexes i
     , dba_ind_columns c
       LEFT OUTER JOIN f
       ON f.owner = c.table_owner
       AND f.table_name = c.table_name
       AND f.extension_name = c.column_name
 WHERE c.table_owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
   AND c.table_owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF')
   AND C.table_name = 'T' 
   AND i.table_name = c.table_name
   AND i.owner = c.index_owner
   AND i.index_name = c.index_name
   AND i.index_type like '%NORMAL'
), i AS ( /*construct column list*/
SELECT /*+ MATERIALIZE*/
       ic.table_owner, ic.table_name,
       ic.index_owner, ic.index_name,
       ic.index_type, ic.uniqueness, ic.visibility,
       listagg(ic.column_name,',') within group (order by ic.column_position) AS column_list,
       '('||listagg('"'||ic.column_name||'"',',') within group (order by ic.column_position)||')' AS extension,
       count(*) num_columns
FROM ic
GROUP BY 
       ic.table_owner, ic.table_name,
       ic.index_owner, ic.index_name,
       ic.index_type, ic.uniqueness, ic.visibility
), e AS ( /*extended stats*/
SELECT /*+MATERIALIZE*/ owner, table_name, CAST(SUBSTR(extension,1,128) AS VARCHAR2(128)) extension, extension_name
FROM   dba_stat_extensions
where  creator = 'USER' /*extended stats not function based indexes*/
) 
SELECT r.table_owner, r.table_name,
       i.index_name||' ('||i.column_list||')' superset_index,
       r.index_name||' ('||r.column_list||')' redundant_index,
       c.constraint_type, c.constraint_name,
       r.index_type, r.visibility, e.extension_name
  FROM i r
       LEFT OUTER JOIN e
         ON  e.owner = r.table_owner
         AND e.table_name = r.table_name
         AND e.extension = r.extension
       LEFT OUTER JOIN dba_constraints c
         ON c.table_name = r.table_name
         AND c.index_owner = r.index_owner
         AND c.index_name = r.index_name
         AND c.owner = r.table_owner
         AND c.constraint_type IN('P','U')
     , i
 WHERE i.table_owner = r.table_owner
   AND i.table_name = r.table_name
   AND i.index_name != r.index_name
   AND i.column_list LIKE r.column_list||',%'
   AND i.num_columns > r.num_columns
 ORDER BY r.table_owner, r.table_name, r.index_name, i.index_name
/
spool off/
I added two more indexes to the example in the previous post, so I can demonstrate the script working with descending and other function-based indexes.
CREATE INDEX T_BCD_ROUND ON T (B, C, ROUND(D,0));
CREATE INDEX T_BCD_DESC  ON T (B, C, D DESC);
The report also shows whether the redundant index is already invisible, whether there are extended statistics on it, and whether there is a primary key or unique constraint than could be moved to the superset index..
Sat Jan 07                                                                                                                         page    1
                                                             Redundant Indexes

Table Owner     Table Name                     Superset Index                                     Redundant Index
--------------- ------------------------------ -------------------------------------------------- ----------------------------------------
Cons                                                      Visi-
Type Constraint Name                Index Type            bility    Extension Name
---- ------------------------------ --------------------- --------- ------------------------------
SYSADM          T                              T_BCD (B,C,D)                                      T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_BCD_DESC (B,C,D)                                 T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_BCD_ROUND (B,C,ROUND(D,0))                       T_BC (B,C)
                                    NORMAL                INVISIBLE SYS_STU3$58HONF9VK$$69P2OW4P4X

SYSADM          T                              T_AB (A,B)                                         T_PK (A)
P    T_PK                           NORMAL                VISIBLE
…

3 comments :

Mohamed Houri said...

Hello

According to my experience it is probably safe to get ride of a redundant index where the redundancy is on a single column and where the container index (the index that will remain) has not too much extra column (>= 3 is probably going to cause an issue).

However, when the redundancy is made on two (or more) columns and the container index contains two (or more) extra columns then this will very probably cause an issue.

Simply put, redundant indexes that are safe to be dropped (after a period of invisibility) are those with a single column covered by an index with two columns.

Best regards
Mohamed Houri

David Kurtz said...

I would refer you to the first post in this sequence The Need for Extended Statistics. When you remove a single-column index, the optimizer can get the number of distinct values from the ordinary column statistics. However, when you remove a multi-column index you need to provide extended statistics to replace the information about number of distinct values that were being provided by the index. If you add extended statistics then it is safe to remove multi-column indexes.

Mohamed Houri said...

David,

I was not referring to the role of the extended statistics which, in the absence of the index, can help the Optimizer getting a correct estimation. I was referring to the CBO refusing to use the remaining index while it was using the previous (redondant one). The extended statistics works only with equalities and, I have seen many situations where the extended statistics are not used because the indexed columns have histogram while the resulting virtual column representing the column group is not a skewed column. In addition, the Clustering Factor of the contained index is very often not as good as that of the redondant index which when deleted might reveal that the cost of full scanning a table is cheaper than scanning the remaining index.

I have had the occasion to set so many redondant indexes in an invisible status and have come up with the conclusion (although heuristic) mentioned in my previous e-mail where this option might work very well when the contained and the container indexes are of 1 and 2 columns respectively.

Best regards
Mohamed Houri

Best regards
Mohamed