Wednesday, February 14, 2024

Table Clusters: 2. Cluster & Cluster Key Design Considerations

This post is part of a series that discusses table clustering in Oracle.

At the beginning of each PeopleSoft payroll calculation process, all the previously calculated result data that is about to be recalculated by that process is deleted; one delete statement for each result table.  The new result data is inserted as each employee is calculated.  As multiple calculation processes run concurrently, their data tends to get mixed up in the result tables.  So the delete statements will concurrently update different rows in the same data block, leading to the database needing to do additional work to ensure read consistency.  
The result tables are not subsequently updated.  Therefore, they are reasonable candidates for building in a table cluster.

Cluster Design Considerations

The original purpose of table clusters was to co-locate rows from different tables that would generally be queried together, in the same data blocks.  This makes retrieval easier by reducing disk I/Os and access time.  Less storage is required because cluster keys are not repeated in either the cluster or the cluster key index.  As disks have become bigger and faster, and memory has become more plentiful, this is less often a consideration.

In this case, I am interested in avoiding read consistency contention.  I want each data block in the cluster to contain only rows with a single distinct cluster key value so that different transactions relating to different employees, and therefore different cluster keys, will be involved in different data blocks.  Therefore, each data block in the cluster will be subject to no more than one concurrent transaction, and the database will not have to maintain multiple read-consistent versions.  I will still avoid the read consistency overhead whether I store multiple tables in one cluster or different tables in different clusters.

The size attribute of the CREATE CLUSTER command specifies the amount of space in bytes reserved to store all rows with the same cluster key value.  Oracle will round it up to the next divisor of the block size.  Thus, if it is greater than half the size of the data block, the database will reserve at least one whole data block for each cluster value.  In my case, the data blocks are 8192 bytes (the default size), so I have set the size equal to the block size. 

I don't know in advance how many distinct cluster key values my data will have, and it will change over time.  Therefore, I will be creating indexed clusters, and I have to build a B-tree index on the cluster key.

I have found that the optimizer tends to choose the cluster key index rather than the longer unique index to search the table because it only has one row per cluster key and is, therefore, smaller and cheaper.  However, it may then have to scan all the blocks for that cluster key, which may in practice take longer.

If one table already frequently fills or exceeds a single block for each cluster key, there is unlikely to be any advantage to adding another table to the same cluster because if Oracle uses the cluster key index, it will then scan all the blocks for that key.  

In my case, I have found that two of the three tables that I plan to cluster, each require more than one block per cluster key, and the third almost fills a block per cluster key.  Therefore, I have decided to put each table in a separate cluster, albeit with the same cluster key.

Cluster Key Design Considerations

The columns listed in the CREATE CLUSTER command specify the cluster key.  They will be used to group data together.  The tables in the cluster have many unique key columns in common.  The first 7 columns of the unique key have been used for cluster key columns.  This is enough to prevent the number of rows per cluster key from growing indefinitely, but not so many that you end up with only a few rows per cluster key, which would result in most table blocks being only partially filled.  This would consume space and increase I/O.

The cluster key is indexed to help find the data blocks for a particular key, just as you would on any other table.  You do not specify columns when creating this index, because it uses the cluster key columns.
CREATE CLUSTER cluster_gp_rslt_abs
(EMPLID VARCHAR2(11), CAL_RUN_ID VARCHAR2(18), EMPL_RCD SMALLINT, GP_PAYGROUP VARCHAR2(10)
,CAL_ID VARCHAR2(18), ORIG_CAL_RUN_ID VARCHAR2(18), RSLT_SEG_NUM SMALLINT)
SIZE 8192 /*one block per cluster value*/
TABLESPACE GPAPP
/
CREATE INDEX cluster_gp_rslt_abs_idx ON CLUSTER cluster_gp_rslt_abs
/

CREATE TABLE psygp_rslt_abs (EMPLID VARCHAR2(11) NOT NULL,
   CAL_RUN_ID  VARCHAR2(18) NOT NULL,
   EMPL_RCD    SMALLINT NOT NULL,
   GP_PAYGROUP VARCHAR2(10) NOT NULL,
   CAL_ID      VARCHAR2(18) NOT NULL,
   ORIG_CAL_RUN_ID VARCHAR2(18) NOT NULL,
   RSLT_SEG_NUM SMALLINT NOT NULL,
…
) CLUSTER cluster_gp_rslt_abs (EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM)
/ CREATE CLUSTER cluster_gp_rslt_acum (EMPLID VARCHAR2(11), CAL_RUN_ID VARCHAR2(18), EMPL_RCD SMALLINT, GP_PAYGROUP VARCHAR2(10) ,CAL_ID VARCHAR2(18), ORIG_CAL_RUN_ID VARCHAR2(18), RSLT_SEG_NUM SMALLINT) SIZE 8192 TABLESPACE GPAPP / CREATE INDEX cluster_gp_rslt_acum_idx ON CLUSTER cluster_gp_rslt_acum / CREATE TABLE psygp_rslt_acum (EMPLID VARCHAR2(11) NOT NULL, … ) CLUSTER cluster_gp_rslt_acum (EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM) / CREATE CLUSTER cluster_gp_rslt_pin (EMPLID VARCHAR2(11), CAL_RUN_ID VARCHAR2(18), EMPL_RCD SMALLINT, GP_PAYGROUP VARCHAR2(10) ,CAL_ID VARCHAR2(18), ORIG_CAL_RUN_ID VARCHAR2(18), RSLT_SEG_NUM SMALLINT) SIZE 8192 TABLESPACE GPAPP / CREATE INDEX cluster_gp_rslt_pin_idx ON CLUSTER cluster_gp_rslt_pin / CREATE TABLE PSYGP_RSLT_PIN (EMPLID VARCHAR2(11) NOT NULL, … ) CLUSTER cluster_gp_rslt_pin (EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM) / …

The indexes on the result tables required by the application, including the unique key indexes, were recreated on the result tables after they had been rebuilt in the cluster and repopulated.  I have only shown the DDL for the unique indexes below.  It is not different to build an index on a clustered table than on a normal heap table.

CREATE UNIQUE  INDEX PS_GP_RSLT_ABS ON PS_GP_RSLT_ABS 
(EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, ABSENCE_DATE, PIN_TAKE_NUM) 
PCTFREE 1 COMPRESS 8 … TABLESPACE PSINDEX  
/
…
CREATE UNIQUE  INDEX PS_GP_RSLT_ACUM ON PS_GP_RSLT_ACUM 
(EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, PIN_NUM, EMPL_RCD_ACUM, 
,ACM_FROM_DT, ACM_THRU_DT, SLICE_BGN_DT, SEQ_NUM8) 
PCTFREE 1 COMPRESS 8 … TABLESPACE PSINDEX  
/
…
CREATE UNIQUE  INDEX PS_GP_RSLT_PIN ON PS_GP_RSLT_PIN 
(EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, INSTANCE, PIN_NUM, SLICE_BGN_DT, SLICE_END_DT) 
PCTFREE 1 COMPUTE STATISTICS COMPRESS 9 … TABLESPACE PSINDEX 
/
…

See also Oracle 19c DBA Guide, Guidelines for Managing Clusters

No comments :