Monday, January 06, 2020

Partial Indexing

This is the second of two blog posts that discuss sparse and partial indexing.

Problem Statement

(This is the same problem statements as for sparse indexing)
It is not an uncommon requirement to find rows that match a rare value in a column with a small number of distinct values.  So, the distribution of values is skewed.  A typical example is a status column where an application processes newer rows that are a relatively small proportion of the table because over time the majority of rows have been processed and are at the final status.
An index is effective at finding the rare values, but it is usually more efficient to scan the table for the common values.  A histogram is would almost certainly be required on such a column.  However, if you build an index on the column you have to index all the rows.  The index is, therefore, larger and requires more maintenance.  Could we not just index the rare values for which we want to use the index to find?
  • Oracle does not index null values. If we could engineer that the common value was null, and then the index would only contain the rare values.  This is sometimes called sparse indexing and was discussed in the previous blog.
  • Or we could separate the rare and common values into different index partitions, and build only the index partition(s) for the rare values.  This is called partial indexing and is discussed in this blog.
As usual, this is not a new subject and other people have written extensively on these subjects, and I will provide links.  However, I want to draw some of the issues together.

Partition Table and Locally Partitioned Partial Index 

I could partition the table on the status column. Here, I have used list partitioning, because the common status is between the two rare status, so I only need two partitions not three. From Oracle 12.1, I can specify indexing on and off on the table and certain partitions so that later I can build partial local indexes only on some partitions. See also:
CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other  VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
) INDEXING OFF 
PARTITION BY LIST (status)
(PARTITION t_status_rare   VALUES ('R','A') INDEXING ON
,PARTITION t_status_common VALUES (DEFAULT) 
) ENABLE ROW MOVEMENT 
/
INSERT /*+APPEND*/ INTO t --(key, status)
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C' 
       WHEN rownum<=1e6-10 THEN 'A' 
       ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');
Here Oracle eliminated the common status partition and only scanned the rare status partition (partition 1). Note that I don't even have an index at this point.  So simply partitioning the table can be effective.
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    58 |     4   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |      |     1 |    58 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE|      |    10 |   580 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS FULL   | T    |    10 |   580 |     4   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='R')

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='R')
However, now when the application updates the status from R (rare) to C (common) the row must be moved between partitions. It is necessary to enable row movement on the table, otherwise, an error will be generated. However, there is additional overhead in moving the row. It is effectively deleted from one partition and inserted into the other.
In this test, I have increased the frequency of one of the rare statuses. Otherwise, the optimizer determines that it is cheaper just to scan the table partition than use the index!
SELECT status, COUNT(*)
FROM   t 
GROUP BY status
/
S   COUNT(*)
- ----------
R         10
A        990
C     999000
Note that I have already specified INDEXING OFF on the table and INDEXING ON on the rare statuses partition. Now I can just build a locally partitioned partial index.
CREATE INDEX t_status ON t(status) LOCAL INDEXING PARTIAL;
Note that only partition T_STATUS_RARE is physically built, and it only contains a single extent. Partition T_STATUS_COMMON exists, is unusable and the segment has not been physically built. It contains no rows and no leaf blocks.
SELECT partition_name, status, num_rows, leaf_blocks
from user_ind_partitions where index_name = 'T_STATUS';

PARTITION_NAME       STATUS     NUM_ROWS LEAF_BLOCKS
-------------------- -------- ---------- -----------
T_STATUS_COMMON      UNUSABLE          0           0
T_STATUS_RARE        USABLE         1000           2

SELECT segment_name, partition_name, blocks
FROM user_segments WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME           BLOCKS
------------ -------------------- ----------
T_STATUS     T_STATUS_RARE                 8

SELECT segment_name, partition_name, segment_type, extent_id, blocks
FROM user_extents WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME       SEGMENT_TYPE        EXTENT_ID     BLOCKS
------------ -------------------- ------------------ ---------- ----------
T_STATUS     T_STATUS_RARE        INDEX PARTITION             0          8
Scans for the common status value can only full scan the table partition because there is no index to use.
SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
      999000

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    55 |  2444   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |      |     1 |    55 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE|      |   998K|    52M|  2444   (1)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS FULL   | T    |   998K|    52M|  2444   (1)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='C')
To query the rare value Oracle does use the index on the rare values partition.
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 3051124889
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |          |     1 |    58 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE                             |          |     1 |    58 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE                     |          |    10 |   580 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T        |    10 |   580 |     2   (0)| 00:00:01 |     1 |     1 |
|*  4 |     INDEX RANGE SCAN                        | T_STATUS |    10 |       |     1   (0)| 00:00:01 |     1 |     1 |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("STATUS"='R')
However, it is not worth using the index for the slightly more common status A.  Here, Oracle full scans the table partition.
SELECT COUNT(other) FROM t WHERE status='A';

COUNT(OTHER)
------------
         990

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    58 |     4   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |      |     1 |    58 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE|      |   990 | 57420 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS FULL   | T    |   990 | 57420 |     4   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='A')
Note that to use partial indexing I also had to partition the table.

Globally Partitioned Index with Zero-Sized Unusable Partitions 

Since Oracle 11.2.0.4, it has been possible to achieve the same effect without partitioning the table, thus avoiding the overhead of row movement. See also:
This feature also worked in earlier versions, but Oracle built a single extent for each unusable partition.
Here, I will recreate a non-partitioned table.
CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other  VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
) 
/
INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C' 
       WHEN rownum<=1e6-10 THEN 'A' 
       ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;

exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

SELECT status, COUNT(*)
FROM   t 
GROUP BY status
/ 

S   COUNT(*)
- ----------
R         10
C     999000
A        990
It is not possible to create a globally list-partitioned index. Oracle simply does not support it.
CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY LIST (status, id2)
(PARTITION t_status_common VALUES ('R','A')
,PARTITION t_status_rare   VALUES (DEFAULT)
);

GLOBAL PARTITION BY LIST (status)
                    *
ERROR at line 2:
ORA-14151: invalid table partitioning method
You can create a global range or hash partitioned index. It is unlikely that the hash values of the column will break down conveniently into particular hash values. In this example, I would still have needed to create 4 hash partitions and still build 2 of them.
WITH x as (
SELECT status, COUNT(*) freq
FROM   t 
GROUP BY status
) SELECT x.*
, dbms_utility.get_hash_value(status,0,2)
, dbms_utility.get_hash_value(status,0,4)
FROM x
/ 

S       FREQ DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,2) DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,4)
- ---------- --------------------------------------- ---------------------------------------
R        990                                       1                                       1
C    1009000                                       0                                       0
A         10                                       0                                       2
It is easier to create a globally range partitioned index. Although in my example, the common status lies between the two rare statuses, so I need to create three partitions. I will create the index unusable and build the two rare status partitions.
CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY RANGE (status)
(PARTITION t_status_rare1  VALUES LESS THAN ('C')
,PARTITION t_status_common VALUES LESS THAN ('D')
,PARTITION t_status_rare2  VALUES LESS THAN (MAXVALUE)
) UNUSABLE;
ALTER INDEX t_status REBUILD PARTITION t_status_rare1;
ALTER INDEX t_status REBUILD PARTITION t_status_rare2;
The index partition for the common status is unusable so Oracle can only full scan the table.
SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
      999000

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    55 |  2445   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    55 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   999K|    52M|  2445   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("STATUS"='C')
However, for the rare statuses, Oracle scans the index and looks up each of the table rows.
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 2558590380
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |     1 |    55 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE                       |          |     1 |    55 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE              |          |    10 |   550 |     2   (0)| 00:00:01 |     3 |     3 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T        |    10 |   550 |     2   (0)| 00:00:01 |       |       |
|*  4 |     INDEX RANGE SCAN                  | T_STATUS |    10 |       |     1   (0)| 00:00:01 |     3 |     3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("STATUS"='R')

Conclusion 

The advantage of this global partitioning approach is that it does not require any change to application code, and it does not involve partitioning the table. However, you will have to remember not to rebuild the unusable partitions, otherwise, they will have to be maintained as the table changes until you make them unusable again and, they will consume space that you will only get back by recreating the entire index.
NB: Partitioning is a licenced option that is only available on the Enterprise Edition of the database.

No comments :