Monday, February 19, 2024

Table Clusters: 5. Using the Cluster Key Index instead of the Primary/Unique Key Index

This post is part of a series that discusses table clustering in Oracle.

  1. Introduction and Ancient History
  2. Cluster & Cluster Key Design Considerations
  3. Populating the Cluster with DBMS_PARALLEL_EXECUTE
  4. Checking the Cluster Key
  5. Using the Cluster Key Index instead of the Primary/Unique Key Index
  6. Testing the Cluster & Conclusion (TL;DR)

In my test case, the cluster key index is made up of the first 7 columns of the unique key index.  One side-effect of this similarity of the keys is that the optimizer may choose to use the cluster key index where previously it used the unique index.  

The cluster key index is a unique index.  It contains only one entry for each distinct cluster key value that points to the first block that contains rows with those cluster key values.  As we saw in the previous post, there are many rows in the table for each distinct cluster key.  Therefore, the cluster key index is much smaller than the unique index on any table in the cluster.  This contributes to making it appear cheaper to access.

The clustering factor is fundamental to determining the cost of using an index.  It is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order.  Notwithstanding that blocks may be cached, every time the scan changes to a different data block in the table, that is another I/O.  

In my case, the clustering factor of the cluster key index is also the same value as the number of rows and the number of distinct keys.  This is because I have set the cluster size equal to the block size so that each cluster key value points to a different block, and each block only contains rows for a single cluster key value.  The clustering factor of the cluster key index is much lower than that of the unique indexes, also making it look cheaper to access.
TABLE_NAME           INDEX_NAME               UNIQUENES PREFIX_LENGTH LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
-------------------- ------------------------ --------- ------------- ----------- ------------- ---------- -----------------
PS_GP_RSLT_CLUSTER   PS_GP_RSLT_CLUSTER_IDX   UNIQUE                       111541       8875383    8875383           8875383
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS           UNIQUE                8     1271559     152019130  152019130          10806251
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM          UNIQUE                8     8421658     762210387  762210387         101166426
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN           UNIQUE                9     3894799     327189471  327189471          31774871

I still need to create the unique index on the tables to enforce uniqueness. I have found that the optimizer tends to choose the cluster key index in preference to the unique index. The cost of accessing cluster key index is lower because it is smaller, and has a lower clustering factor. When I increased the length of the cluster key from 3 to 7 columns I also found that the size and clustering factor of the cluster key index increased, and the clustering factor for the unique indexes decreased, partly because the rows are less disordered with respect to the index key, and partly because the size of the table decreased because each cluster key is only stored one. Although this reduced the cost of accessing the unique indexes, I still find the optimizer tends to choose the cluster key index over the unique index.

Sometimes, the switch to the cluster key index is beneficial, but sometimes performance degrades as in the case of this query.
SELECT … 
FROM PS_GP_RSLT_ACUM RA ,PS_GP_ACCUMULATOR A ,PS_GP_PYE_HIST_WRK H 
WHERE H.EMPLID BETWEEN :1 AND :2 AND H.CAL_RUN_ID=:3 
AND H.RUN_CNTL_ID=:4 AND H.OPRID=:5 
AND H.EMPLID=RA.EMPLID 
AND H.EMPL_RCD=RA.EMPL_RCD 
AND H.GP_PAYGROUP=RA.GP_PAYGROUP 
AND H.CAL_ID=RA.CAL_ID 
AND H.ORIG_CAL_RUN_ID=RA.ORIG_CAL_RUN_ID 
AND H.HIST_CAL_RUN_ID=RA.CAL_RUN_ID 
AND H.RSLT_SEG_NUM=RA.RSLT_SEG_NUM 
AND RA.PIN_NUM=A.PIN_NUM 
AND RA.ACM_PRD_OPTN<>'1' 
AND(H.CALC_TYPE=A.CALC_TYPE OR H.HIST_TYPE= 'G') 
ORDER BY RA.EMPLID,H.PRC_ORD_TS,RA.EMPL_RCD,RA.PIN_NUM
PS_GP_PYE_HIST_WRK is equi-joined to PS_GP_RSLT_ACUM by all 7 cluster key columns, so the cluster key index can satisfy this join.  The plan has switched to using the cluster key index.

Plan hash value: 4007126853
 
-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                         |       |       |  2369 (100)|          |
|   1 |  SORT ORDER BY                          |                         |   133 | 36841 |  2369   (1)| 00:00:01 |
|*  2 |   FILTER                                |                         |       |       |            |          |
|*  3 |    HASH JOIN                            |                         |   133 | 36841 |  2368   (1)| 00:00:01 |
|   4 |     NESTED LOOPS                        |                         |   393 |   103K|  2348   (1)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| PS_GP_PYE_HIST_WRK      |  1164 |   156K|    12   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | PS_GP_PYE_HIST_WRK      |     1 |       |    11   (0)| 00:00:01 |
|*  7 |      TABLE ACCESS CLUSTER               | PS_GP_RSLT_ACUM         |     1 |   132 |     3   (0)| 00:00:01 |
|*  8 |       INDEX UNIQUE SCAN                 | PS_GP_RSLT_CLUSTER_IDX  |     1 |       |     1   (0)| 00:00:01 |
|   9 |     INDEX FAST FULL SCAN                | PSBGP_ACCUMULATOR       |  9208 | 64456 |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------
The profile of the ASH data by plan line ID shows that most of the time is spent on physical I/O on line 7 of the plan, physically scanning the blocks in the cluster for each cluster key
   SQL Plan         SQL Plan                                               H   P E        ASH
 Hash Value          Line ID EVENT                                         P     x       Secs
----------- ---------------- --------------------------------------------- --- - --- --------
 4007126853                7 db file sequential read                       N   N Y        120
 4007126853                  db file sequential read                       N   N Y         80
I can force the plan back to using the unique index on PS_GP_RSLT_ACUM with a hint, SQL Profile, SQL Patch, or SQL Plan Baseline, and there is a reduction in database response time.
NB: You cannot make a cluster key index invisible.
Plan hash value: 1843812660
 
------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                 | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                          |                    |       |       |   845 (100)|          |
|     1 |  SORT ORDER BY                            |                    |     1 |   277 |   845   (1)| 00:00:01 |
|  *  2 |   FILTER                                  |                    |       |       |            |          |
|  *  3 |    HASH JOIN                              |                    |     1 |   277 |   844   (1)| 00:00:01 |
|-    4 |     NESTED LOOPS                          |                    |     1 |   277 |   844   (1)| 00:00:01 |
|-    5 |      STATISTICS COLLECTOR                 |                    |       |       |            |          |
|     6 |       NESTED LOOPS                        |                    |     1 |   270 |   843   (1)| 00:00:01 |
|     7 |        TABLE ACCESS BY INDEX ROWID        | PS_GP_PYE_HIST_WRK |   416 | 57408 |     6   (0)| 00:00:01 |
|  *  8 |         INDEX RANGE SCAN                  | PS_GP_PYE_HIST_WRK |     1 |       |     5   (0)| 00:00:01 |
|  *  9 |        TABLE ACCESS BY INDEX ROWID BATCHED| PS_GP_RSLT_ACUM    |     1 |   132 |     5   (0)| 00:00:01 |
|  * 10 |         INDEX RANGE SCAN                  | PS_GP_RSLT_ACUM    |     1 |       |     4   (0)| 00:00:01 |
|- * 11 |      INDEX RANGE SCAN                     | PSBGP_ACCUMULATOR  |     1 |     7 |     1   (0)| 00:00:01 |
|    12 |     INDEX FAST FULL SCAN                  | PSBGP_ACCUMULATOR  |     1 |     7 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

   SQL Plan         SQL Plan                                               H   P E        ASH
 Hash Value          Line ID EVENT                                         P     x       Secs
----------- ---------------- --------------------------------------------- --- - --- --------
 1843812660               10 db file sequential read                       N   N Y         70
 1843812660                9 db file sequential read                       N   N Y         60
 1843812660                  CPU+CPU Wait                                  N   N Y         50

Table Cached Blocks 

The table_cached_blocks statistics preference specifies the average number of blocks assumed to be cached in the buffer cache when calculating the index clustering factor. When DBMS_STATS calculates the clustering factor of an index it does not count visits to table blocks assumed to be cached because they were in the last n distinct table blocks visit, where n is the value to which table_cached_blocks is set.

We have already seen that with 7 cluster key columns, no more than 7 blocks are required to hold any one cluster key.  If I set table cached blocks to at least 7, then when Oracle scans the table blocks in unique key order (which matches the cluster key order for the first 7 columns) it does not count additional visits to blocks for the same cluster key. Thus we see a reduction in the clustering factor on the unique index. There is no advantage to a higher value of this setting. We do not see a significant reduction in the clustering factor on other indexes with different leading columns. 

TCB=1
TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130          10806251 1          12-01-24 15:33:02
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387         101166426 1          12-01-24 15:37:55
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          31774872 1          12-01-24 15:39:00
TCB=8
 TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130           8217000 1          12-01-24 15:05:42
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387          16658798 1          12-01-24 15:10:40
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          11321888 1          12-01-24 15:01:37

TCB=16

TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130           8217000 1          12-01-24 15:44:25
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387          16658710 1          12-01-24 15:49:29
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          11321888 1          12-01-24 15:50:36

The reduction in the clustering factor can mitigate the optimizer's tendency to use the cluster key index, but it may still occur.

NB: table_cached_blocks applies only when gathering statistics with DBMS_STATS, and not to CREATE INDEX or REBUILD INDEX operations that use the default value of 1.  This is not a bug, it is in the DBMS_STATS documentation.  

See also

TL;DR

The statistics on the cluster key index may lead the optimizer to determine the cost of using it is lower than the unique index.  The switch from the unique/primary key index to the cluster key index may result in poorer performance.  Setting Table Cached Blocks on the tables in the cluster may help.  However, you may still need to use SQL Profiles/SQL Plan Baselines/SQL Patches to force the optimizer to continue to use the unique indexes.

No comments :