Thursday, February 22, 2024

Table Clusters: 6. Testing the Cluster & Conclusion (TL;DR)

This post is the last part of a series that discusses table clustering in Oracle.

  1. Introduction and Ancient History
  2. Cluster & Cluster Key Design Considerations
  3. Populating the Cluster with DBMS_PARALLEL_EXECUTE
  4. Checking the Cluster Key
  5. Using the Cluster Key Index instead of the Primary/Unique Key Index
  6. Testing the Cluster & Conclusion (TL;DR)

Testing

We did get improved performance with the clustered tables.  More significantly, we encountered less inter-process contention, and so were able to run more concurrent processes, and the overall elapsed time of all the processes was reduced.

Looking at just the performance of the bulk delete statements on the result tables, there is a significant reduction in DB time and physical I/O time on the clustered tables.  The reduction in physical I/O is not only because the table is smaller, but because there is no need to perform consistent read recovery on the blocks, there are fewer reads from the undo segment and less CPU was consumed creating consistent read copies in the buffer cache.

Statement Heap Table Clustered Table
DELETE FROM PS_GP_RSLT_ACUM…
DB Time (s)

2182

1662

delete statement only db file sequential

1451

891

CPU

941

531


Statement Heap Table Clustered Table
DELETE FROM PS_GP_RSLT_ABS…
DB Time (s)

781

330

delete statement only db file sequential

340

210

CPU

300

120

GP_RSLT_PIN is another, albeit smaller, result table.  It is a candidate for clustering, however, it was not clustered for this test and therefore did not show any significant improvement.  It was subsequently clustered.

Statement

Heap Table Heap in Cluster Test
DELETE FROM PS_GP_RSLT_PIN…
DB Time (s)

270

250

delete statement only db file sequential

110

120

CPU

110

90

The execution plans for some queries on clustered tables changed to use the cluster key index which resulted in poorer performance.  I had to introduce some SQL profiles to reinstate the original execution plans.  
However, the execution plans for these delete statements also switched to the cluster key index resulting in improved performance.  So it depends.

Conclusion (TL;DR)

Table partitioning can help you find data efficiently by allowing the database to eliminate partitions that cannot contain the data.  However, you must be running Enterprise Edition and license the partitioning option.
Table clustering is effective when you are regularly querying data from multiple tables with similar keys, and you can store them in the same data blocks, thus saving the overhead of retrieving multiple blocks.  It is available on any Oracle database and does not require any additional licence.
Both partitioning and clustering can help avoid the overhead of read consistency by storing dissimilar data in different blocks.
Sometimes, using the cluster key index can result in worse performance than using the original indexes.  A SQL profile or SQL baseline may be needed to stabilise some execution plans.

Monday, February 19, 2024

Table Clusters: 5. Using the Cluster Key Index instead of the Primary/Unique Key Index

This post is part of a series that discusses table clustering in Oracle.

  1. Introduction and Ancient History
  2. Cluster & Cluster Key Design Considerations
  3. Populating the Cluster with DBMS_PARALLEL_EXECUTE
  4. Checking the Cluster Key
  5. Using the Cluster Key Index instead of the Primary/Unique Key Index
  6. Testing the Cluster & Conclusion (TL;DR)

In my test case, the cluster key index is made up of the first 7 columns of the unique key index.  One side-effect of this similarity of the keys is that the optimizer may choose to use the cluster key index where previously it used the unique index.  

The cluster key index is a unique index.  It contains only one entry for each distinct cluster key value that points to the first block that contains rows with those cluster key values.  As we saw in the previous post, there are many rows in the table for each distinct cluster key.  Therefore, the cluster key index is much smaller than the unique index on any table in the cluster.  This contributes to making it appear cheaper to access.

The clustering factor is fundamental to determining the cost of using an index.  It is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order.  Notwithstanding that blocks may be cached, every time the scan changes to a different data block in the table, that is another I/O.  

In my case, the clustering factor of the cluster key index is also the same value as the number of rows and the number of distinct keys.  This is because I have set the cluster size equal to the block size so that each cluster key value points to a different block, and each block only contains rows for a single cluster key value.  The clustering factor of the cluster key index is much lower than that of the unique indexes, also making it look cheaper to access.
TABLE_NAME           INDEX_NAME               UNIQUENES PREFIX_LENGTH LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
-------------------- ------------------------ --------- ------------- ----------- ------------- ---------- -----------------
PS_GP_RSLT_CLUSTER   PS_GP_RSLT_CLUSTER_IDX   UNIQUE                       111541       8875383    8875383           8875383
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS           UNIQUE                8     1271559     152019130  152019130          10806251
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM          UNIQUE                8     8421658     762210387  762210387         101166426
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN           UNIQUE                9     3894799     327189471  327189471          31774871

I still need to create the unique index on the tables to enforce uniqueness. I have found that the optimizer tends to choose the cluster key index in preference to the unique index. The cost of accessing cluster key index is lower because it is smaller, and has a lower clustering factor. When I increased the length of the cluster key from 3 to 7 columns I also found that the size and clustering factor of the cluster key index increased, and the clustering factor for the unique indexes decreased, partly because the rows are less disordered with respect to the index key, and partly because the size of the table decreased because each cluster key is only stored one. Although this reduced the cost of accessing the unique indexes, I still find the optimizer tends to choose the cluster key index over the unique index.

Sometimes, the switch to the cluster key index is beneficial, but sometimes performance degrades as in the case of this query.
SELECT … 
FROM PS_GP_RSLT_ACUM RA ,PS_GP_ACCUMULATOR A ,PS_GP_PYE_HIST_WRK H 
WHERE H.EMPLID BETWEEN :1 AND :2 AND H.CAL_RUN_ID=:3 
AND H.RUN_CNTL_ID=:4 AND H.OPRID=:5 
AND H.EMPLID=RA.EMPLID 
AND H.EMPL_RCD=RA.EMPL_RCD 
AND H.GP_PAYGROUP=RA.GP_PAYGROUP 
AND H.CAL_ID=RA.CAL_ID 
AND H.ORIG_CAL_RUN_ID=RA.ORIG_CAL_RUN_ID 
AND H.HIST_CAL_RUN_ID=RA.CAL_RUN_ID 
AND H.RSLT_SEG_NUM=RA.RSLT_SEG_NUM 
AND RA.PIN_NUM=A.PIN_NUM 
AND RA.ACM_PRD_OPTN<>'1' 
AND(H.CALC_TYPE=A.CALC_TYPE OR H.HIST_TYPE= 'G') 
ORDER BY RA.EMPLID,H.PRC_ORD_TS,RA.EMPL_RCD,RA.PIN_NUM
PS_GP_PYE_HIST_WRK is equi-joined to PS_GP_RSLT_ACUM by all 7 cluster key columns, so the cluster key index can satisfy this join.  The plan has switched to using the cluster key index.

Plan hash value: 4007126853
 
-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                         |       |       |  2369 (100)|          |
|   1 |  SORT ORDER BY                          |                         |   133 | 36841 |  2369   (1)| 00:00:01 |
|*  2 |   FILTER                                |                         |       |       |            |          |
|*  3 |    HASH JOIN                            |                         |   133 | 36841 |  2368   (1)| 00:00:01 |
|   4 |     NESTED LOOPS                        |                         |   393 |   103K|  2348   (1)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| PS_GP_PYE_HIST_WRK      |  1164 |   156K|    12   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | PS_GP_PYE_HIST_WRK      |     1 |       |    11   (0)| 00:00:01 |
|*  7 |      TABLE ACCESS CLUSTER               | PS_GP_RSLT_ACUM         |     1 |   132 |     3   (0)| 00:00:01 |
|*  8 |       INDEX UNIQUE SCAN                 | PS_GP_RSLT_CLUSTER_IDX  |     1 |       |     1   (0)| 00:00:01 |
|   9 |     INDEX FAST FULL SCAN                | PSBGP_ACCUMULATOR       |  9208 | 64456 |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------
The profile of the ASH data by plan line ID shows that most of the time is spent on physical I/O on line 7 of the plan, physically scanning the blocks in the cluster for each cluster key
   SQL Plan         SQL Plan                                               H   P E        ASH
 Hash Value          Line ID EVENT                                         P     x       Secs
----------- ---------------- --------------------------------------------- --- - --- --------
 4007126853                7 db file sequential read                       N   N Y        120
 4007126853                  db file sequential read                       N   N Y         80
I can force the plan back to using the unique index on PS_GP_RSLT_ACUM with a hint, SQL Profile, SQL Patch, or SQL Plan Baseline, and there is a reduction in database response time.
NB: You cannot make a cluster key index invisible.
Plan hash value: 1843812660
 
------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                 | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                          |                    |       |       |   845 (100)|          |
|     1 |  SORT ORDER BY                            |                    |     1 |   277 |   845   (1)| 00:00:01 |
|  *  2 |   FILTER                                  |                    |       |       |            |          |
|  *  3 |    HASH JOIN                              |                    |     1 |   277 |   844   (1)| 00:00:01 |
|-    4 |     NESTED LOOPS                          |                    |     1 |   277 |   844   (1)| 00:00:01 |
|-    5 |      STATISTICS COLLECTOR                 |                    |       |       |            |          |
|     6 |       NESTED LOOPS                        |                    |     1 |   270 |   843   (1)| 00:00:01 |
|     7 |        TABLE ACCESS BY INDEX ROWID        | PS_GP_PYE_HIST_WRK |   416 | 57408 |     6   (0)| 00:00:01 |
|  *  8 |         INDEX RANGE SCAN                  | PS_GP_PYE_HIST_WRK |     1 |       |     5   (0)| 00:00:01 |
|  *  9 |        TABLE ACCESS BY INDEX ROWID BATCHED| PS_GP_RSLT_ACUM    |     1 |   132 |     5   (0)| 00:00:01 |
|  * 10 |         INDEX RANGE SCAN                  | PS_GP_RSLT_ACUM    |     1 |       |     4   (0)| 00:00:01 |
|- * 11 |      INDEX RANGE SCAN                     | PSBGP_ACCUMULATOR  |     1 |     7 |     1   (0)| 00:00:01 |
|    12 |     INDEX FAST FULL SCAN                  | PSBGP_ACCUMULATOR  |     1 |     7 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

   SQL Plan         SQL Plan                                               H   P E        ASH
 Hash Value          Line ID EVENT                                         P     x       Secs
----------- ---------------- --------------------------------------------- --- - --- --------
 1843812660               10 db file sequential read                       N   N Y         70
 1843812660                9 db file sequential read                       N   N Y         60
 1843812660                  CPU+CPU Wait                                  N   N Y         50

Table Cached Blocks 

The table_cached_blocks statistics preference specifies the average number of blocks assumed to be cached in the buffer cache when calculating the index clustering factor. When DBMS_STATS calculates the clustering factor of an index it does not count visits to table blocks assumed to be cached because they were in the last n distinct table blocks visit, where n is the value to which table_cached_blocks is set.

We have already seen that with 7 cluster key columns, no more than 7 blocks are required to hold any one cluster key.  If I set table cached blocks to at least 7, then when Oracle scans the table blocks in unique key order (which matches the cluster key order for the first 7 columns) it does not count additional visits to blocks for the same cluster key. Thus we see a reduction in the clustering factor on the unique index. There is no advantage to a higher value of this setting. We do not see a significant reduction in the clustering factor on other indexes with different leading columns. 

TCB=1
TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130          10806251 1          12-01-24 15:33:02
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387         101166426 1          12-01-24 15:37:55
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          31774872 1          12-01-24 15:39:00
TCB=8
 TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130           8217000 1          12-01-24 15:05:42
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387          16658798 1          12-01-24 15:10:40
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          11321888 1          12-01-24 15:01:37

TCB=16

TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130           8217000 1          12-01-24 15:44:25
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387          16658710 1          12-01-24 15:49:29
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          11321888 1          12-01-24 15:50:36

The reduction in the clustering factor can mitigate the optimizer's tendency to use the cluster key index, but it may still occur.

NB: table_cached_blocks applies only when gathering statistics with DBMS_STATS, and not to CREATE INDEX or REBUILD INDEX operations that use the default value of 1.  This is not a bug, it is in the DBMS_STATS documentation.  

See also

TL;DR

The statistics on the cluster key index may lead the optimizer to determine the cost of using it is lower than the unique index.  The switch from the unique/primary key index to the cluster key index may result in poorer performance.  Setting Table Cached Blocks on the tables in the cluster may help.  However, you may still need to use SQL Profiles/SQL Plan Baselines/SQL Patches to force the optimizer to continue to use the unique indexes.

Friday, February 16, 2024

Table Clusters: 4. Checking the Cluster Key

This post is part of a series that discusses table clustering in Oracle.
  1. Introduction and Ancient History
  2. Cluster & Cluster Key Design Considerations
  3. Populating the Cluster with DBMS_PARALLEL_EXECUTE
  4. Checking the Cluster Key
  5. Using the Cluster Key Index instead of the Primary/Unique Key Index
  6. Testing the Cluster & Conclusion (TL;DR)

This query calculates the frequency of each number of distinct blocks per cluster key.  It uses DBMS_ROWID to get the block number from the ROWID.  The query counts the number of distinct blocks per cluster key, and the number of times that number of blocks per key occurs.

with x as ( --cluster key and rowid of each row
  select emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
  ,      DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_no from ps_gp_rslt_abs
), y as ( --count number of rows per cluster key and block number
  select /*+MATERIALIZE*/ emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
  ,      block_no, count(*) num_rows 
  from   x   
  group by emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, block_no
), z as ( --count number of blocks and rows per cluster key
  select /*+MATERIALIZE*/ emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
  ,      count(distinct block_no) num_blocks, sum(num_rows) num_rows 
  from   y
  group by emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
)
select num_blocks, count(distinct emplid) emplids
,      sum(num_rows) sum_rows
,      median(num_rows) median_rows
,      median(num_rows)/num_blocks median_rows_per_block
from   z
group by num_blocks
order by num_blocks
/

The answer you get depends on the data, so your mileage will vary.  

Initially, I built the cluster with 3 columns in the key.  In my case, 81% of rows were organised such that they have no more than 2 data blocks per cluster key.   

NUM_BLOCKS    EMPLIDS   SUM_ROWS MEDIAN_ROWS MEDIAN_ROWS_PER_BLOCK
---------- ---------- ---------- ----------- ---------------------
         1      69638   46809975          12                    12
         2      47629   78370682          34                    17
         3      12120   14330976          68            22.6666667
         4       4598    4395844          94                  23.5
         5       2376    6941389         124                  24.8
         6        652    2510790         155            25.8333333
         7         27      34527         185            26.4285714
         8         14      12330         217                27.125
         9          9      40633         248            27.5555556
        10          1      14607         279                  27.9
        11          1        310         310            28.1818182
        12          2       2212         310            25.8333333
        13          1       1476         372            28.6153846
        14          1        372         372            26.5714286
I rebuilt the cluster with 7 key columns.  Now no cluster key has more than 7 blocks, most of the keys are in a single block, and 85% are in no more than 2.  Increasing the length of the cluster key also resulted in the table being smaller because each cluster key is only stored once.
NUM_BLOCKS    EMPLIDS   SUM_ROWS MEDIAN_ROWS    MEDIAN_ROWS_PER_BLOCK
---------- ---------- ---------- -----------  ---------------------
         1      74545   71067239          14                    14
         2      52943   57481538          40                    20
         3      13553   11185685          73            24.3333333
         4       4567    8949787         120                    30
         5       1327    3251707         150                    30
         6        144      81977         160            26.6666667
         7          3       1197       204.5            29.2142857
There is now only a small number of employees whose data is spread across many cluster blocks.  They might be slower to access, but I think I have a reasonable balance.

Thursday, February 15, 2024

Table Clusters: 3. Populating the Cluster with DBMS_PARALLEL_EXECUTE

This post is part of a series that discusses table clustering in Oracle.

The result tables being clustered are also large, containing hundreds of millions of rows.  Normally, when I have to rebuild these as non-clustered tables, I would do so in direct-path mode and with both parallel insert and parallel query. However, this is not effective for table clusters, particularly if you put multiple tables in one cluster, as rows with the same cluster key have to go into the same data blocks.

Instead, for each result table in the cluster, I have used DBMS_PARALLEL_EXECUTE to take a simple INSERT…SELECT statement, and break it into pieces that can be run concurrently on the database job scheduler.  I get the parallelism, though I also have to accept the redo on the insert.

exec DBMS_PARALLEL_EXECUTE.DROP_TASK('CLUSTER_GP_RSLT_ABS');

DECLARE
  l_recname VARCHAR2(15) := 'GP_RSLT_ABS';
  l_src_prefix VARCHAR2(10) := 'ORIG_';
  l_task VARCHAR2(30);
  l_sql_stmt CLOB;
  l_col_list CLOB;
BEGIN
  l_task := 'CLUSTER_'||l_recname;
  
  SELECT LISTAGG(column_name,',') WITHIN GROUP(ORDER BY column_id) 
  INTO l_col_list 
  FROM user_tab_cols WHERE table_name = l_src_prefix||l_recname;
  
  l_sql_stmt := 'insert into PSY'||l_recname||' ('||l_col_list||') SELECT '||l_col_list
              ||' FROM '||l_src_prefix||l_recname||' WHERE rowid BETWEEN :start_id AND :end_id';
  
  DBMS_PARALLEL_EXECUTE.CREATE_TASK (l_task);
  DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID(l_task, 'SYSADM', l_src_prefix||l_recname, true, 2e6);
  DBMS_PARALLEL_EXECUTE.RUN_TASK(l_task, l_sql_stmt, DBMS_SQL.NATIVE, parallel_level => 24);
END;
/

The performance of this process is the first indication as to whether the cluster key is correct.  Too few columns and the population of the table will be much slower because rows have to go in the block already allocated to that cluster key, or if full a new block must be allocated.  

NB: Chunking the data by ROWID only works where the source table is a regular table.  It does not work for clustered or index-organised tables.  The alternative is to chunk by the value of a numeric column, and that doesn't work well in this case because most of the key columns are strings or dates.

Monitoring DBMS_PARALLEL_EXECUTE

There are several views provided by Oracle that can be used to monitor tasks created by DBMS_PARALLEL_EXECUTE.
SELECT * FROM user_parallel_execute_tasks;
                                                                            Number                                                
TASK_NAME            CHUNK_TYPE   STATUS     TABLE_OWNER TABLE_NAME         Column     TASK_COMMENT                   JOB_PREFIX  
-------------------- ------------ ---------- ----------- ------------------ ---------- ------------------------------ ------------
                                                                                               Apply                                           
                                                                                 Lang          X Ed    Fire_ Parallel                     
SQL_STMT                                                                         Flag EDITION  Trigger Apply    Level JOB_CLASS           
-------------------------------------------------------------------------------- ---- -------- ------- ----- -------- -----------------
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  FINISHED   SYSADM      PS_GP_RSLT_ABS                                               TASK$_38380  
insert into PSYGP_RSLT_ABS (EMPLID,CAL_RUN_ID,EMPL_RCD,GP_PAYGROUP,CAL_ID,ORIG_C    1 ORA$BASE         TRUE        24 DEFAULT_JOB_CLASS   

CLUSTER_GP_RSLT_ACUM ROWID_RANGE  FINISHED   SYSADM      PS_GP_RSLT_ACUM                                              TASK$_38382  
insert into PSYGP_RSLT_ACUM (EMPLID,CAL_RUN_ID,EMPL_RCD,GP_PAYGROUP,CAL_ID,ORIG_    1 ORA$BASE         TRUE        32 DEFAULT_JOB_CLASS
Each task is broken into chunks.
SELECT task_name, status, count(*) chunks
, min(start_ts) min_start_ts, max(end_ts) max_end_ts
, max(end_ts)-min(start_ts) duration
FROM user_parallel_execute_chunks 
group by task_name, status
order by min_start_ts nulls last
/TASK_NAME            STATUS         CHUNKS MIN_START_TS            MAX_END_TS              DURATION           
-------------------- ---------- ---------- ----------------------- ----------------------- -------------------
CLUSTER_GP_RSLT_ABS  PROCESSED          80 22/12/2023 09.58.37.712 22/12/2023 10.06.32.264 +00 00:07:54.551373
CLUSTER_GP_RSLT_ACUM PROCESSED         402 22/12/2023 10.08.58.257 22/12/2023 10.38.36.820 +00 00:29:38.562700
In this case, each chunk processes a range of ROWIDs.  Each chunk is allocated to a database scheduler job.
SELECT chunk_id, task_name, status, start_rowid, end_rowid, job_name, start_ts, end_ts, error_code, error_message
FROM user_parallel_execute_chunks 
WHERE task_name = 'CLUSTER_GP_RSLT_ABS'
ORDER BY chunk_id
/

Chunk                                                                                                                                                                                          
   ID TASK_NAME            STATUS     START_ROWID        END_ROWID          JOB_NAME        START_TS                END_TS                  ERROR_CODE ERROR_MESSAGE                           
----- -------------------- ---------- ------------------ ------------------ --------------- ----------------------- ----------------------- ---------- -----------------
    1 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAAZgAAAA AAAUzUAAmAADp7VH// TASK$_38380_1   22/12/2023 09:58:37.712 22/12/2023 10:00:21.622                                                    
    2 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAADp7WAAA AAAUzUAAmAAGwkrH// TASK$_38380_3   22/12/2023 09:58:37.713 22/12/2023 10:00:20.107                                                    
    3 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAGwksAAA AAAUzUAAmAAHvwBH// TASK$_38380_2   22/12/2023 09:58:37.713 22/12/2023 10:00:14.939                                                    
    4 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAHvwCAAA AAAUzUAAmAAIn5XH// TASK$_38380_9   22/12/2023 09:58:37.864 22/12/2023 10:00:28.963                                                    
    5 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAIn5YAAA AAAUzUAAmAAJ58tH// TASK$_38380_12  22/12/2023 09:58:37.865 22/12/2023 10:00:30.494                                                    
    6 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAJ58uAAA AAAUzUAAmAAKzADH// TASK$_38380_8   22/12/2023 09:58:37.865 22/12/2023 10:00:26.049                                                    
    7 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAKzAEAAA AAAUzUAAmAALf7ZH// TASK$_38380_4   22/12/2023 09:58:37.865 22/12/2023 10:00:28.017                                                    
    8 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAALf7aAAA AAAUzUAAmAAMHGvH// TASK$_38380_10  22/12/2023 09:58:37.885 22/12/2023 10:00:23.326                                                    
    9 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAMHGwAAA AAAUzUAAmAAP5aFH// TASK$_38380_13  22/12/2023 09:58:37.907 22/12/2023 10:00:22.660                                                    
   10 CLUSTER_GP_RSLT_ABS  PROCESSED  AAAUzUAAmAAP5aGAAA AAAUzUAAnAACr1bH// TASK$_38380_5   22/12/2023 09:58:37.929 22/12/2023 10:00:21.959
…
However, one job may process many chunks.
SELECT t.task_name, t.chunk_type, t.table_name, c.chunk_id, c.job_name, c.start_ts, c.end_ts
, d.actual_start_date, d.run_duration, d.instance_id, d.session_id
FROM user_parallel_execute_tasks t
JOIN user_parallel_execute_chunks c ON c.task_name = t.task_name
JOIN user_scheduler_job_run_details d ON d.job_name = c.job_name
WHERE t.task_name = 'CLUSTER_GP_RSLT_ABS'
ORDER BY t.task_name, c.job_name, c.start_ts
/
                                                  Chunk                                                                                                             Inst             
TASK_NAME            CHUNK_TYPE   TABLE_NAME         ID JOB_NAME        START_TS                END_TS                  ACTUAL_START_DATE       RUN_DURATION          ID SESSION_ID  
-------------------- ------------ --------------- ----- --------------- ----------------------- ----------------------- ----------------------- ------------------- ---- ------------
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS      1 TASK$_38380_1   22/12/2023 09:58:37.712 22/12/2023 10:00:21.622 22/12/2023 09:58:37.660 +00 00:07:52.000000    1 3406,24003  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     23 TASK$_38380_1   22/12/2023 10:00:21.710 22/12/2023 10:02:01.916 22/12/2023 09:58:37.660 +00 00:07:52.000000    1 3406,24003  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     44 TASK$_38380_1   22/12/2023 10:02:02.008 22/12/2023 10:03:31.546 22/12/2023 09:58:37.660 +00 00:07:52.000000    1 3406,24003  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     57 TASK$_38380_1   22/12/2023 10:03:31.640 22/12/2023 10:05:05.398 22/12/2023 09:58:37.660 +00 00:07:52.000000    1 3406,24003  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     73 TASK$_38380_1   22/12/2023 10:05:05.494 22/12/2023 10:06:29.262 22/12/2023 09:58:37.660 +00 00:07:52.000000    1 3406,24003  

CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS      8 TASK$_38380_10  22/12/2023 09:58:37.885 22/12/2023 10:00:23.326 22/12/2023 09:58:37.877 +00 00:07:54.000000    1 4904,44975  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     27 TASK$_38380_10  22/12/2023 10:00:23.394 22/12/2023 10:01:59.096 22/12/2023 09:58:37.877 +00 00:07:54.000000    1 4904,44975  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     42 TASK$_38380_10  22/12/2023 10:01:59.185 22/12/2023 10:03:37.657 22/12/2023 09:58:37.877 +00 00:07:54.000000    1 4904,44975  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     61 TASK$_38380_10  22/12/2023 10:03:37.742 22/12/2023 10:05:12.680 22/12/2023 09:58:37.877 +00 00:07:54.000000    1 4904,44975  
CLUSTER_GP_RSLT_ABS  ROWID_RANGE  PS_GP_RSLT_ABS     79 TASK$_38380_10  22/12/2023 10:05:12.776 22/12/2023 10:06:32.142 22/12/2023 09:58:37.877 +00 00:07:54.000000    1 4904,44975  
…
You can also judge how well the clustering is working by looking at how much database time was consumed by the various events.  PS_GP_RSLT_ABS was inserted first, then PS_GP_RSLT_ACUM.  We can see that more time was spent on the second table that was inserted, and more time spent on physical read operations as rows have to go into specific blocks with the same cluster keys.
select c.task_name, c.status, count(distinct c.chunk_id) chunks, h.module, h.event
, sum(usecs_per_Row)/1e6 ash_secs
from gv$active_session_history h
, user_parallel_execute_chunks c
, user_parallel_execute_tasks t
where h.sample_time BETWEEN c.start_ts AND NVL(c.end_ts,SYSDATE)
and t.task_name = c.task_name
and h.action like c.job_name
group by c.task_name, c.status, h.module, h.event
order by task_name, ash_Secs desc
/
TASK_NAME            STATUS     CHUNKS MODULE          EVENT                                                            ASH_SECS
-------------------- ---------- ------ --------------- ---------------------------------------------------------------- --------
CLUSTER_GP_RSLT_ABS  PROCESSED      80 DBMS_SCHEDULER                                                                       3534
                     PROCESSED      78 DBMS_SCHEDULER  enq: FB - contention                                                 1184
                     PROCESSED      80 DBMS_SCHEDULER  db file parallel read                                                1161
                     PROCESSED      80 DBMS_SCHEDULER  buffer busy waits                                                     674
                     PROCESSED      79 DBMS_SCHEDULER  db file scattered read                                                490
…

CLUSTER_GP_RSLT_ACUM PROCESSED     401 DBMS_SCHEDULER                                                                      10174
                     PROCESSED     401 DBMS_SCHEDULER  db file sequential read                                              8813
                     PROCESSED      32 DBMS_SCHEDULER  log file switch (archiving needed)                                   4623
                     PROCESSED     389 DBMS_SCHEDULER  db file parallel read                                                1396
                     PROCESSED     383 DBMS_SCHEDULER  db file scattered read                                               1346
                     PROCESSED     295 DBMS_SCHEDULER  buffer busy waits                                                     769
                     PROCESSED     287 DBMS_SCHEDULER  enq: FB - contention                                                  715
…

Wednesday, February 14, 2024

Table Clusters: 2. Cluster & Cluster Key Design Considerations

This post is part of a series that discusses table clustering in Oracle.

At the beginning of each PeopleSoft payroll calculation process, all the previously calculated result data that is about to be recalculated by that process is deleted; one delete statement for each result table.  The new result data is inserted as each employee is calculated.  As multiple calculation processes run concurrently, their data tends to get mixed up in the result tables.  So the delete statements will concurrently update different rows in the same data block, leading to the database needing to do additional work to ensure read consistency.  
The result tables are not subsequently updated.  Therefore, they are reasonable candidates for building in a table cluster.

Cluster Design Considerations

The original purpose of table clusters was to co-locate rows from different tables that would generally be queried together, in the same data blocks.  This makes retrieval easier by reducing disk I/Os and access time.  Less storage is required because cluster keys are not repeated in either the cluster or the cluster key index.  As disks have become bigger and faster, and memory has become more plentiful, this is less often a consideration.

In this case, I am interested in avoiding read consistency contention.  I want each data block in the cluster to contain only rows with a single distinct cluster key value so that different transactions relating to different employees, and therefore different cluster keys, will be involved in different data blocks.  Therefore, each data block in the cluster will be subject to no more than one concurrent transaction, and the database will not have to maintain multiple read-consistent versions.  I will still avoid the read consistency overhead whether I store multiple tables in one cluster or different tables in different clusters.

The size attribute of the CREATE CLUSTER command specifies the amount of space in bytes reserved to store all rows with the same cluster key value.  Oracle will round it up to the next divisor of the block size.  Thus, if it is greater than half the size of the data block, the database will reserve at least one whole data block for each cluster value.  In my case, the data blocks are 8192 bytes (the default size), so I have set the size equal to the block size. 

I don't know in advance how many distinct cluster key values my data will have, and it will change over time.  Therefore, I will be creating indexed clusters, and I have to build a B-tree index on the cluster key.

I have found that the optimizer tends to choose the cluster key index rather than the longer unique index to search the table because it only has one row per cluster key and is, therefore, smaller and cheaper.  However, it may then have to scan all the blocks for that cluster key, which may in practice take longer.

If one table already frequently fills or exceeds a single block for each cluster key, there is unlikely to be any advantage to adding another table to the same cluster because if Oracle uses the cluster key index, it will then scan all the blocks for that key.  

In my case, I have found that two of the three tables that I plan to cluster, each require more than one block per cluster key, and the third almost fills a block per cluster key.  Therefore, I have decided to put each table in a separate cluster, albeit with the same cluster key.

Cluster Key Design Considerations

The columns listed in the CREATE CLUSTER command specify the cluster key.  They will be used to group data together.  The tables in the cluster have many unique key columns in common.  The first 7 columns of the unique key have been used for cluster key columns.  This is enough to prevent the number of rows per cluster key from growing indefinitely, but not so many that you end up with only a few rows per cluster key, which would result in most table blocks being only partially filled.  This would consume space and increase I/O.

The cluster key is indexed to help find the data blocks for a particular key, just as you would on any other table.  You do not specify columns when creating this index, because it uses the cluster key columns.
CREATE CLUSTER cluster_gp_rslt_abs
(EMPLID VARCHAR2(11), CAL_RUN_ID VARCHAR2(18), EMPL_RCD SMALLINT, GP_PAYGROUP VARCHAR2(10)
,CAL_ID VARCHAR2(18), ORIG_CAL_RUN_ID VARCHAR2(18), RSLT_SEG_NUM SMALLINT)
SIZE 8192 /*one block per cluster value*/
TABLESPACE GPAPP
/
CREATE INDEX cluster_gp_rslt_abs_idx ON CLUSTER cluster_gp_rslt_abs
/

CREATE TABLE psygp_rslt_abs (EMPLID VARCHAR2(11) NOT NULL,
   CAL_RUN_ID  VARCHAR2(18) NOT NULL,
   EMPL_RCD    SMALLINT NOT NULL,
   GP_PAYGROUP VARCHAR2(10) NOT NULL,
   CAL_ID      VARCHAR2(18) NOT NULL,
   ORIG_CAL_RUN_ID VARCHAR2(18) NOT NULL,
   RSLT_SEG_NUM SMALLINT NOT NULL,
…
) CLUSTER cluster_gp_rslt_abs (EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM)
/ CREATE CLUSTER cluster_gp_rslt_acum (EMPLID VARCHAR2(11), CAL_RUN_ID VARCHAR2(18), EMPL_RCD SMALLINT, GP_PAYGROUP VARCHAR2(10) ,CAL_ID VARCHAR2(18), ORIG_CAL_RUN_ID VARCHAR2(18), RSLT_SEG_NUM SMALLINT) SIZE 8192 TABLESPACE GPAPP / CREATE INDEX cluster_gp_rslt_acum_idx ON CLUSTER cluster_gp_rslt_acum / CREATE TABLE psygp_rslt_acum (EMPLID VARCHAR2(11) NOT NULL, … ) CLUSTER cluster_gp_rslt_acum (EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM) / CREATE CLUSTER cluster_gp_rslt_pin (EMPLID VARCHAR2(11), CAL_RUN_ID VARCHAR2(18), EMPL_RCD SMALLINT, GP_PAYGROUP VARCHAR2(10) ,CAL_ID VARCHAR2(18), ORIG_CAL_RUN_ID VARCHAR2(18), RSLT_SEG_NUM SMALLINT) SIZE 8192 TABLESPACE GPAPP / CREATE INDEX cluster_gp_rslt_pin_idx ON CLUSTER cluster_gp_rslt_pin / CREATE TABLE PSYGP_RSLT_PIN (EMPLID VARCHAR2(11) NOT NULL, … ) CLUSTER cluster_gp_rslt_pin (EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM) / …

The indexes on the result tables required by the application, including the unique key indexes, were recreated on the result tables after they had been rebuilt in the cluster and repopulated.  I have only shown the DDL for the unique indexes below.  It is not different to build an index on a clustered table than on a normal heap table.

CREATE UNIQUE  INDEX PS_GP_RSLT_ABS ON PS_GP_RSLT_ABS 
(EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, ABSENCE_DATE, PIN_TAKE_NUM) 
PCTFREE 1 COMPRESS 8 … TABLESPACE PSINDEX  
/
…
CREATE UNIQUE  INDEX PS_GP_RSLT_ACUM ON PS_GP_RSLT_ACUM 
(EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, PIN_NUM, EMPL_RCD_ACUM, 
,ACM_FROM_DT, ACM_THRU_DT, SLICE_BGN_DT, SEQ_NUM8) 
PCTFREE 1 COMPRESS 8 … TABLESPACE PSINDEX  
/
…
CREATE UNIQUE  INDEX PS_GP_RSLT_PIN ON PS_GP_RSLT_PIN 
(EMPLID, CAL_RUN_ID, EMPL_RCD, GP_PAYGROUP, CAL_ID, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, INSTANCE, PIN_NUM, SLICE_BGN_DT, SLICE_END_DT) 
PCTFREE 1 COMPUTE STATISTICS COMPRESS 9 … TABLESPACE PSINDEX 
/
…

See also Oracle 19c DBA Guide, Guidelines for Managing Clusters

Tuesday, February 13, 2024

Table Clusters: 1. An Alternative to Partitioning? - Introduction & Ancient History

This post is the first part of a series that discusses table clustering in Oracle.
Links will appear as sections are posted.

Introduction

Table clustering and table partitioning are very different technologies.  However, they both create a relationship between the logical value of the data and its physical location.  Similar data values are stored together, and therefore dissimilar data values are kept apart.  

The advantage of storing similar values together is to reduce I/O and improve access time.  However, this series of blogs looks at the characteristic of keeping dissimilar values apart that, as with partitioning, can be harnessed to avoid the need to maintain read consistency during concurrent processing and therefore avoid its overhead.

Partitioning is only available in the Enterprise Edition of Oracle, and then you have to license the partitioning option.  Table clustering is available in all database versions and doesn't require any additional licence.  So you might consider clustering when partitioning is not an option.

Ancient History

The last time I put tables into a cluster was in 2001 on Oracle 7.3.3 (partitioning didn't become available until Oracle 8.0).  Our problem was that multiple instances of the PeopleSoft Global Payroll calculation were concurrently updating different rows in the same data blocks leading the database to generate read consistent copies of each block for each session.  That consumed lots of CPU, required additional space in the buffer cache, generated additional physical reads on the undo segments, and generated additional writes due to delayed block cleanout of dirty data blocks in the buffer cache.  This significantly degraded performance, and very soon overall performance became worse as we increased the number of concurrent processes.

I had the idea of clustering the payroll tables on employee ID.  Thus I could ensure the data for different employees was in different data blocks and the database wouldn't have to do read-consistent recovery on the blocks in those tables.  There might still be some contention on indexes, but this would be less severe on indexes that lead on the cluster key columns because index entries are sorted in key order.

"A table cluster is a group of tables that share common columns and store related data in the same blocks … Because table clusters store related rows of different tables in the same data blocks, properly used table clusters offer the following benefits over non-clustered tables:

see Oracle 19c Database Concepts: Overview of Table Clusters

Table clusters were not fashionable then, and have certainly not become more so since.  Although we all use them every day, the Oracle catalogue has 37 tables in 10 clusters.  In 19c, the C_OBJ# cluster contains 17 tables!  When I proposed table clustering, the Swiss DBA turned to me and said 'If you build a cluster, I am going to a Kloster!' (this pun works in German: a 'Kloster' is a monastery or convent).  This rebuke has stayed with me ever since.

Nonetheless, we rebuilt our result tables in a cluster, and it delivered a performance improvement until the data volumes grew such that suddenly we had multiple data blocks per cluster key, and then the performance was much worse!  Our mistake was not having enough columns in the cluster key, thus illustrating that the choice of cluster keys is very important.

However, that forced the upgrade to Oracle 8i and we started to use table partitioning, such that a partition corresponded to the data processed by each concurrent payroll process.  That approach works very well, certainly better than clustering, for many customers who use this product and are licensed for partitioning.  They could generally scale the number of streams until they fully loaded either the CPU or the disk subsystem. 

Now in 2023, I am looking at another large PeopleSoft HCM implementation using the same calculation engine for absence, but this customer isn't licensed for partitioning, so we are back to table clusters.

Now read on.

Friday, January 26, 2024

Just because the execution plan says INMEMORY, it doesn't mean it is using In-Memory

Parallel Query

If you are using RAC, and you have in-memory objects populated across nodes (i.e. distribution by ROWID range) or you have objects populated in only 1 node (i.e. distribution by partition or sub-partition) then you need to use parallel query to access data populated on a node to which the query is not connected. 

  • There is no cache fusion with Database In-Memory.  Oracle does not ship In-Memory Compression Units (IMCUs) across the RAC interconnect.
  • Similarly, if you have set PARALLEL_FORCE_LOCAL=TRUE the parallel query will not be able to access the remote nodes.
In-memory improves performance by avoiding physical I/O, but the reduction in CPU consumption can be more significant.  In the cloud, this can save money by reducing your cloud subscription costs.  However, parallel query can be a brutal way of using CPU to complete a query faster.  It often increases total CPU consumption, thus negating some of the benefits of in-memory.

Options:

A query that is not executing in parallel will only be able to access objects in the local in-memory store.  You can ensure that a segment is stored in the in-memory store on every RAC node by specifying DUPLICATE ALL.  Parallel queries will also use the local in-memory store.  
  • This option can improve performance but the in-memory stores consume more memory.  On a 2-node RAC database, it doubles the memory consumption of In-Memory.
  • The DUPLICATE option is only available on Exadata.  On other platforms, it is ignored (see also Oracle Database In-Memory on RAC - Part 3).
Alternatively, you can use database services to create node affinity.  
  • A process can connect using a database service that specifies a specific node or nodes.  
  • Parallel queries can be restricted to specific nodes by setting PARALLEL_INSTANCE_GROUP to use a service (see also Oracle Database In-Memory on RAC - Part 2).
  • In-memory segments can be placed in the in-memory store on specific nodes by distributing them with a specific service (see also How to control where objects are populated into [In-]memory on RAC).
  • You may prefer to create different services for the query processes and in-memory population processes.  In the case of node failure, you probably want the query process connection to fail over to another node.  However, you may not want that to happen for in-memory distribution processes because of the additional memory overhead.
Otherwise, on a 2-node RAC, a non-parallel query has a 50% chance of finding the segment in the in-memory store because it has a 50% chance of connecting to the node where it is stored!

Is It Using In Memory?

I am going to demonstrate this using a table with 2 partitions.
CREATE TABLE t (a number, b number, c VARCHAR2(1000)) PARTITION BY RANGE (b)
(partition t1 VALUES LESS THAN(50)       
,partition t2 VALUES LESS THAN(MAXVALUE)
) INMEMORY;
INSERT INTO t SELECT level, MOD(level,100),  RPAD(TO_CHAR(TO_DATE(level,'j'),'Jsp'),100,'.')
FROM DUAL CONNECT BY LEVEL <= 1e5;
commit;

Serial Query

I am going to generate execute plans for two similar queries that each query different partitions of a table. The execution plans have the same plan hash value. The only difference is that the first query accesses only the first partition, and the second query only accesses the second partition.  
Both plans claim they are doing an INMEMORY full scan of the table.  However, this is only a statement of intent.
explain plan for SELECT sum(a), sum(b), count(*) FROM t WHERE b=42;
…
Plan hash value: 2993254470
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |    26 |    11   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE              |      |     1 |    26 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE     |      |   942 | 24492 |    11   (0)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS INMEMORY FULL| T    |   942 | 24492 |    11   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------------
explain plan for SELECT sum(a), sum(b), count(*) FROM t WHERE b=56;
…
Plan hash value: 2993254470
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |    26 |    11   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE              |      |     1 |    26 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE     |      |   926 | 24076 |    11   (0)| 00:00:01 |     2 |     2 |
|*  3 |    TABLE ACCESS INMEMORY FULL| T    |   926 | 24076 |    11   (0)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------------
Oracle, distributes the partitions across the in-memory stores in RAC nodes.  In my case, the first partition is on instance 1, and the second partition is on instance 2.
select inst_id, owner, segment_name, partition_name, inmemory_size, bytes, bytes_not_populated, populate_status, inmemory_duplicate
from gv$im_segments where segment_name = 'T' order by inst_id;

   INST_ID OWNER      SEGMENT_NAME PARTITION_NAME INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED POPULATE_STAT INMEMORY_DUPL
---------- ---------- ------------ -------------- ------------- ---------- ------------------- ------------- -------------
         1 SYSADM     T            T1                   6422528    8241152                   0 COMPLETED     NO DUPLICATE
         2 SYSADM     T            T2                   6422528    8241152                   0 COMPLETED     NO DUPLICATE
I will run the queries, capturing the session statistics to a temporary table. I use the Oracle-delivered global temporary table plan_table so that I don't have to create my own table.
delete from plan_table;
insert into plan_table (statement_id, plan_id, id, cost, parent_id) select '_', s.* from v$mystat s;

SELECT sum(a), sum(b), count(*) FROM t WHERE b=42;

insert into plan_table (statement_id, plan_id, id, cost, parent_id) select 'A', s.* from v$mystat s;

SELECT sum(a), sum(b), count(*) FROM t WHERE b=56;

insert into plan_table (statement_id, plan_id, id, cost, parent_id) select 'B', s.* from v$mystat s;
Then I can simply query where the IM statistics are different.
with x (scenario, sid, statistic#, value) as (select statement_id, plan_id, id, cost from plan_table)
select x.statistic#, n.name 
, a.value-x.value diff_a
, b.value-a.value diff_b
from v$statname n, x, x a, x b
where x.scenario = '_'
and x.sid = a.sid and x.statistic# = a.statistic# and a.scenario = 'A'
and x.sid = b.sid and x.statistic# = b.statistic# and b.scenario = 'B'
and (x.value < a.value OR a.value < b.value)
and n.statistic# = x.statistic#
and n.name like 'IM %' and not n.name like 'IM %populate%'
order by x.statistic#;
I only got an in-memory query on instance 2. On instance 1, there is a single IM scan segments disk operation. This is the 'number of times a segment marked for in-memory was accessed entirely from the buffer cache/direct read' (see Popular Statistics with Database In-Memory). This indicates that there was no in-memory query. 
STATISTIC# NAME                                                   DIFF_A     DIFF_B
---------- -------------------------------------------------- ---------- ----------
       772 IM scan CUs no cleanout                                     0          1
       802 IM scan CUs current                                         0          1
       830 IM scan CUs readlist creation accumulated time              0          2
       832 IM scan CUs readlist creation number                        0          1
       838 IM scan delta - only base scan                              0          1
      1376 IM scan CUs pcode aggregation pushdown                      0          3
      1377 IM scan rows pcode aggregated                               0       1000
      1379 IM scan CUs pcode pred evaled                               0          1
      1385 IM scan dict engine results reused                          0          3
      1480 IM scan CUs memcompress for query low                       0          1
      1493 IM scan segments disk                                       1          0
      1494 IM scan bytes in-memory                                     0    5940559
      1495 IM scan bytes uncompressed                                  0    5444950
      1496 IM scan CUs columns accessed                                0          2
      1498 IM scan CUs columns theoretical max                         0          3
      1505 IM scan rows                                                0      50000
      1506 IM simd compare calls                                       0          3
      1512 IM simd decode unpack calls                                 0          6
      1513 IM simd decode symbol calls                                 0          2
      1520 IM simd decode unpack selective calls                       0          6
      1527 IM scan rows valid                                          0      50000
      1533 IM scan rows projected                                      0          1
      1538 IM scan CUs split pieces                                    0          1
      1571 IM scan CUs predicates received                             0          1
      1572 IM scan CUs predicates applied                              0          1
      1577 IM scan segments minmax eligible                            0          1
      1611 IM SubCU-MM CUs Examined                                    0          1

Parallel Query 

I will repeat the test, but use a parallel hint to enable parallel query.
SELECT /*+PARALLEL*/ sum(a), sum(b), count(*) FROM t WHERE b=42;
Now, I get a parallel execution plan
Plan hash value: 943991435
 
-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |     1 |    26 |     6   (0)| 00:00:01 |       |       |        |      |            |
|   1 |  SORT AGGREGATE                 |          |     1 |    26 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR                |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)          | :TQ10000 |     1 |    26 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE              |          |     1 |    26 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR          |          |   942 | 24492 |     6   (0)| 00:00:01 |     1 |     1 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS INMEMORY FULL| T        |   942 | 24492 |     6   (0)| 00:00:01 |     1 |     1 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------
The IM statistics show that both the queries performed an in-memory query.
STATISTIC# NAME                                                   DIFF_A     DIFF_B
---------- -------------------------------------------------- ---------- ----------
       772 IM scan CUs no cleanout                                     3          3
       802 IM scan CUs current                                         3          3
       830 IM scan CUs readlist creation accumulated time              4          4
       832 IM scan CUs readlist creation number                        3          3
       838 IM scan delta - only base scan                              3          3
      1376 IM scan CUs pcode aggregation pushdown                      9          9
      1377 IM scan rows pcode aggregated                            1000       1000
      1379 IM scan CUs pcode pred evaled                               3          3
      1385 IM scan dict engine results reused                          9          9
      1480 IM scan CUs memcompress for query low                       3          3
      1494 IM scan bytes in-memory                              17819283   17821701
      1495 IM scan bytes uncompressed                           16328826   16334850
      1496 IM scan CUs columns accessed                                6          6
      1498 IM scan CUs columns theoretical max                         9          9
      1505 IM scan rows                                           150000     150000
      1506 IM simd compare calls                                       9          9
      1512 IM simd decode unpack calls                                18         18
      1513 IM simd decode symbol calls                                 6          6
      1520 IM simd decode unpack selective calls                      18         18
      1527 IM scan rows valid                                      50000      50000
      1529 IM scan rows range excluded                            100000     100000
      1533 IM scan rows projected                                      3          3
      1538 IM scan CUs split pieces                                    6          3
      1571 IM scan CUs predicates received                             3          3
      1572 IM scan CUs predicates applied                              3          3
      1577 IM scan segments minmax eligible                            3          3
      1611 IM SubCU-MM CUs Examined                                    3          3

Duplicate In-Memory Store 

This time, I will repeat the test with a duplicate in-memory store. The DUPLICATE option stores the segment in the in-memory store on one other RAC node, the DUPLICATE ALL option stores it on all RAC nodes. On a 2-node RAC they come to the same thing.
CREATE TABLE t (a number, b number, c VARCHAR2(1000)) PARTITION BY RANGE (b)
(partition t1 VALUES LESS THAN(50)       
,partition t2 VALUES LESS THAN(MAXVALUE)
) INMEMORY DUPLICATE ALL;
Now, both partitions are stored on both instances.
select inst_id, owner, segment_name, partition_name, inmemory_size, bytes, bytes_not_populated, populate_status, inmemory_duplicate
from gv$im_segments where segment_name = 'T' order by inst_id, segment_name, partition_name;

   INST_ID OWNER      SEGMENT_NAME PARTITION_NAME INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED POPULATE_STAT INMEMORY_DUPL
---------- ---------- ------------ -------------- ------------- ---------- ------------------- ------------- -------------
         1 SYSADM     T            T1                   6422528    8241152                   0 COMPLETED     DUPLICATE
         1 SYSADM     T            T2                   6422528    8241152                   0 COMPLETED     DUPLICATE
         2 SYSADM     T            T1                   6422528    8241152                   0 COMPLETED     DUPLICATE
         2 SYSADM     T            T2                   9568256    8241152                   0 COMPLETED     DUPLICATE 
I will return to the original queries without the parallel hints
SELECT sum(a), sum(b), count(*) FROM t WHERE b=42;
SELECT sum(a), sum(b), count(*) FROM t WHERE b=56;
The in-memory statistics are the same for both queries, indicating that an in-memory query was successfully performed for both partitions because both partitions are stored in in-memory on both instances.
STATISTIC# NAME                                                   DIFF_A     DIFF_B
---------- -------------------------------------------------- ---------- ----------
       772 IM scan CUs no cleanout                                     1          1
       802 IM scan CUs current                                         1          1
       830 IM scan CUs readlist creation accumulated time              3          2
       832 IM scan CUs readlist creation number                        1          1
       838 IM scan delta - only base scan                              1          1
      1376 IM scan CUs pcode aggregation pushdown                      3          3
      1377 IM scan rows pcode aggregated                            1000       1000
      1379 IM scan CUs pcode pred evaled                               1          1
      1385 IM scan dict engine results reused                          3          3
      1480 IM scan CUs memcompress for query low                       1          1
      1494 IM scan bytes in-memory                               5939777    5940563
      1495 IM scan bytes uncompressed                            5442942    5444950
      1496 IM scan CUs columns accessed                                2          2
      1498 IM scan CUs columns theoretical max                         3          3
      1505 IM scan rows                                            50000      50000
      1506 IM simd compare calls                                       3          3
      1512 IM simd decode unpack calls                                 6          6
      1513 IM simd decode symbol calls                                 2          2
      1520 IM simd decode unpack selective calls                       6          6
      1527 IM scan rows valid                                      50000      50000
      1533 IM scan rows projected                                      1          1
      1538 IM scan CUs split pieces                                    2          2
      1571 IM scan CUs predicates received                             1          1
      1572 IM scan CUs predicates applied                              1          1
      1577 IM scan segments minmax eligible                            1          1
      1611 IM SubCU-MM CUs Examined                                    1          1

TL;DR 

The presence of an in-memory operation in an execution plan does not mean that the statement is definitely using in-memory. Rather, it means that it will use an in-memory query if it finds the segment in the in-memory store, and that content is up to date. 
Look at the session level statistics to determine whether the query really did use in-memory as I have demonstrated in this blog, or a SQL Monitor active report (see Oracle Database In-Memory on RAC - Part 1 (revised)). 
Parallel query is used to access an object stored in in-memory on a different node remote to where the session is connected. If the query is not run in parallel it will not be able to access it. This will be indicated by the ' IM scan segments disk' statistics. Alternatives are to duplicate the in-memory store on Exadata or to use services to create node affinity.

My thanks to Andy Rivenes for the initial comment that sent me off into this subject, and to the various articles that he and Maria Colgan have posted on Oracle Database In-Memory blog that I have linked in this note.

Wednesday, January 03, 2024

Job Chains

I have a requirement to run several concurrent jobs, and then, only when they have all finished, I want to run another job.  Rather than create several stand-alone jobs, I can create a chain of sub-jobs.

  • Each step in the chain maps to a program that can invoke either a PL/SQL procedure, a PL/SQL block, a SQL script, or an external program.
  • Each step has a rule that includes a condition that determines when it starts.  Thus, it can be after one or more other steps have been completed or succeeded.
  • A priority can be specified on the program that will determine the order in which programs will be run on the scheduler, all other factors being equal.
  • The number of jobs that are permitted to run concurrently can be controlled with a user-defined scheduler resource.  The resource is defined as having a number of units.  The number of units consumed by a job can be specified in an attribute of a stand-alone job.  In a job chain, the resource consumption attribute is applied to the program called from the chain step, rather than the job.  Only as many jobs as there are resources available are executed concurrently.

Job Chain Parameters

User-defined parameters can be passed into a stand-alone job, but not (as far as I have been able to find out) into steps in a job chain.  Instead job chain metadata, including the job name and sub-name, can be specified as parameters and then application parameters could be looked up for each step in a parameter table.  

This naturally leads to a data-driven approach to managing chains, starting with a parameter table containing meta-data from which to create a job chain.  Then, when the chain executes, the programs can look up the parameters from the same table, and update other values on it for logging.

Demonstration

  • In this example, all the jobs will execute a procedure in a PL/SQL package.
  • 10 jobs that will all run for different specified amounts of time.  
  • I want to run the longest ones first, so they will be given higher priority.
  • The jobs will each consume a different number of units of a user-defined resource.  Therefore, it will constrain how many jobs can run concurrently.
  • A final job will only run when the first 10 jobs have all been completed.

Parameter Table

I will start by creating a parameter table that will be used to create the job chain.  It will contain a row for each step in the chain.  .

create table test_chain
(seq            INTEGER
,chain_name     VARCHAR2(128)
,step_name      VARCHAR2(24)
,program_name   VARCHAR2(128)
,program_action VARCHAR2(128)
,resource_units NUMBER 
,priority       INTEGER DEFAULT 3 NOT NULL CONSTRAINT test_chain_priority_chk CHECK (priority IN(1,2,3,4,5))
,condition      VARCHAR2(4000) DEFAULT 'TRUE' NOT NULL
,end_step       VARCHAR2(1) DEFAULT 'N' NOT NULL CONSTRAINT test_chain_end_step_chk CHECK (end_step IN('Y','N'))
,seconds        number
,begindttm      timestamp
,enddttm        timestamp
,CONSTRAINT test_chain_uk PRIMARY KEY (chain_name, step_name)
);
The parameter table is populated with the chain steps.
truncate table test_chain;
insert into test_chain
(seq, chain_name, step_name, program_name, program_action, resource_units, condition, priority, seconds)
select 1, 'TEST_CHAIN_1', 'CHAIN_STEP_'||level, 'TEST_PROGRAM_'||level, 'TEST_PROCEDURE'
, level resource_units, 'TRUE' condition, NTILE(5) OVER (order by level desc) priority, 10*level seconds
from dual connect by level <= 10
/
insert into test_chain
(seq, chain_name, step_name, program_name, program_action, seconds)
select 2, 'TEST_CHAIN_1', 'CHAIN_STEP_LAST', 'TEST_PROGRAM_LAST', 'TEST_PROCEDURE', 1
from dual 
/
update test_chain c
set condition = (
   SELECT LISTAGG(':'||b.step_name||'.state=''SUCCEEDED''',' AND ') WITHIN GROUP (ORDER BY b.step_name)
   FROM   test_chain b
   WHERE  b.seq = c.seq-1
   and    b.chain_name = c.chain_name)
where seq = 2 and chain_name = 'TEST_CHAIN_1'
/
insert into test_chain
(seq, chain_name, step_name, end_step, condition, seconds)
select 3, 'TEST_CHAIN_1', 'CHAIN_STEP_END', 'Y', ':CHAIN_STEP_LAST.state=''SUCCEEDED''', 1
from dual 
/
commit;
The chain steps are in 3 sequenced groups. 
  1. 10 concurrent jobs that run first.
  2. A job that runs after the first 10 jobs have been completed. The initiation criteria are generated with a LISTAGG() function that lists the 10 steps in sequence 1.
  3. A step that specifies the end of the chain. It is dependent on the job in sequence 2.  There is no program for this step.
column seq format 99
column chain_name format a20
column step_name format a20
column program_name format a25
column program_action format a25 wrapped on
column resource_units heading 'Res.|Units' format 99999
column condition format a40
column units format 999
column seconds format 999

select c.* 
from test_chain c
where chain_name = 'TEST_CHAIN_1'
order by seq, resource_units;                                                                         Res.                                                                                                                           
SEQ CHAIN_NAME    STEP_NAME        PROGRAM_NAME       PROGRAM_ACTION     Units   PRIORITY CONDITION                                E SECONDS 
--- ------------- ---------------- ------------------ ------------------ ----- ---------- ---------------------------------------- - ------- 
  1 TEST_CHAIN_1  CHAIN_STEP_1     TEST_PROGRAM_1     TEST_PROCEDURE         1          5 TRUE                                     N      10                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_2     TEST_PROGRAM_2     TEST_PROCEDURE         2          5 TRUE                                     N      20                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_3     TEST_PROGRAM_3     TEST_PROCEDURE         3          4 TRUE                                     N      30                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_4     TEST_PROGRAM_4     TEST_PROCEDURE         4          4 TRUE                                     N      40                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_5     TEST_PROGRAM_5     TEST_PROCEDURE         5          3 TRUE                                     N      50                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_6     TEST_PROGRAM_6     TEST_PROCEDURE         6          3 TRUE                                     N      60                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_7     TEST_PROGRAM_7     TEST_PROCEDURE         7          2 TRUE                                     N      70                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_8     TEST_PROGRAM_8     TEST_PROCEDURE         8          2 TRUE                                     N      80                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_9     TEST_PROGRAM_9     TEST_PROCEDURE         9          1 TRUE                                     N      90                                                              
  1 TEST_CHAIN_1  CHAIN_STEP_10    TEST_PROGRAM_10    TEST_PROCEDURE        10          1 TRUE                                     N     100                                                              
  2 TEST_CHAIN_1  CHAIN_STEP_LAST  TEST_PROGRAM_LAST  TEST_PROCEDURE                    3 :CHAIN_STEP_1.state='SUCCEEDED' AND :CHA N       1                                                              
                                                                                          IN_STEP_10.state='SUCCEEDED' AND :CHAIN_                                                                        
                                                                                          STEP_2.state='SUCCEEDED' AND :CHAIN_STEP                                                                        
                                                                                          _3.state='SUCCEEDED' AND :CHAIN_STEP_4.s                                                                        
                                                                                          tate='SUCCEEDED' AND :CHAIN_STEP_5.state                                                                        
                                                                                          ='SUCCEEDED' AND :CHAIN_STEP_6.state='SU                                                                        
                                                                                          CCEEDED' AND :CHAIN_STEP_7.state='SUCCEE                                                                        
                                                                                          DED' AND :CHAIN_STEP_8.state='SUCCEEDED'                                                                        
                                                                                          AND :CHAIN_STEP_9.state='SUCCEEDED'                                                                            
  3 TEST_CHAIN_1  CHAIN_STEP_END                                                        3 :CHAIN_STEP_LAST.state='SUCCEEDED'       Y       1                                                              
It is only necessary to have multiple programs if you need to execute different procedures, use different priorities, or use different amounts of a resource.  In this example, each step has a different program even though they all execute the same procedure because I want to demonstrate the effect of different amounts of resource consumption and different priorities.

Test Procedure 

This procedure will be called by the chain steps. The chain step name will be passed to the test procedure as a parameter. The first update statement both updates BEGINDTTM on the parameter table and fetches the number of seconds for which the procedure is to sleep.
create or replace procedure test_procedure
(p_step_name VARCHAR2) as 
  k_module CONSTANT v$session.module%TYPE := $$PLSQL_UNIT;
  l_module v$session.module%TYPE;
  l_action v$session.action%TYPE;
  l_seconds test_chain.seconds%TYPE;
BEGIN
  dbms_application_info.read_module(l_module, l_action);
  dbms_application_info.set_module(k_module, p_step_name);

  UPDATE test_chain
  SET begindttm = SYSTIMESTAMP
  WHERE step_name = p_step_name
  RETURNING seconds INTO l_seconds;
  COMMIT;
  
  dbms_output.put_line(k_module||'.'||p_step_name||':'||l_seconds);
  dbms_lock.sleep(l_seconds);

  UPDATE test_chain
  SET enddttm = SYSTIMESTAMP
  WHERE step_name = p_step_name;
  COMMIT;

  dbms_application_info.set_module(l_module, l_action);
EXCEPTION
  WHEN OTHERS THEN
    dbms_application_info.set_module(l_module, l_action);
    RAISE;
END;
/

Creating the Chain 

Then the parameter table is used to create the chain, programs, chain rules, and job that will be executed.
DECLARE
  e_scheduler_chain_does_not_exist EXCEPTION;
  PRAGMA exception_init(e_scheduler_chain_does_not_exist,-23308);
  e_scheduler_job_does_not_exist EXCEPTION;
  PRAGMA exception_init(e_scheduler_job_does_not_exist,-27475);
  e_scheduler_object_does_not_exist EXCEPTION;
  PRAGMA exception_init(e_scheduler_object_does_not_exist,-27476);
  e_scheduler_object_already_exists EXCEPTION;
  PRAGMA exception_init(e_scheduler_object_already_exists,-27477);

  l_job_suffix CONSTANT VARCHAR2(10) := '_JOB';
  l_resource_suffix CONSTANT VARCHAR2(10) := '_RESOURCE';
BEGIN
  FOR i IN (SELECT DISTINCT chain_name FROM test_chain) LOOP
    BEGIN --drop resource if already present
      DBMS_SCHEDULER.drop_resource (resource_name => i.chain_name||l_resource_suffix);
    EXCEPTION WHEN e_scheduler_object_does_not_exist THEN NULL;
    END;
    DBMS_SCHEDULER.create_resource ( --recreate resource
      resource_name    => i.chain_name||l_resource_suffix,
      units            => 10,
      status           => 'ENFORCE_CONSTRAINTS', -- Default
      constraint_level => 'JOB_LEVEL');       -- Default
  
    BEGIN --drop scheduler job if already present
      DBMS_SCHEDULER.drop_job(job_name => i.chain_name||l_job_suffix);
    EXCEPTION WHEN e_scheduler_job_does_not_exist THEN NULL;
    END;
    
    BEGIN --drop chain if already present
      DBMS_SCHEDULER.drop_chain (chain_name => i.chain_name, force=>TRUE);    
    EXCEPTION WHEN e_scheduler_chain_does_not_exist THEN NULL;
    END;
    DBMS_SCHEDULER.create_chain ( --recreate chain
      chain_name          => i.chain_name,
      rule_set_name       => NULL,
      evaluation_interval => NULL);
  END LOOP;
  
  FOR i IN (
    select c.* from test_chain c
    ORDER BY seq, priority, resource_units desc
  ) LOOP
    dbms_output.put_line(i.chain_name||', Step:'||i.step_name||', Condition:'||i.condition); 

    IF i.program_name IS NOT NULL THEN
      BEGIN
        DBMS_SCHEDULER.create_program ( --create program to call stored procedure
          program_name   => i.program_name,
          program_type   => 'STORED_PROCEDURE',
          program_action => i.program_action,
          number_of_arguments => 1,
          enabled        => FALSE,
          comments       => 'Program for chain:'||i.chain_name||', step:'||i.step_name);
        DBMS_SCHEDULER.DEFINE_METADATA_ARGUMENT( --pass job_subname as first parameter
          program_name   => i.program_name,
          metadata_attribute => 'job_subname', 
          argument_position => 1);
        DBMS_SCHEDULER.set_attribute ( --apply priority to program
          name      => i.program_name, 
          attribute => 'job_priority',
          value     => i.priority);
        DBMS_SCHEDULER.set_resource_constraint ( --apply resource consumption constraint to program
          object_name   => i.program_name, --cannot go on step
          resource_name => i.chain_name||l_resource_suffix,
          units         => i.resource_units);     
        dbms_scheduler.enable(i.program_name);
        dbms_output.put_line(i.chain_name||', Step:'||i.step_name||', Program:'||i.program_name); 
      EXCEPTION WHEN e_scheduler_object_already_exists THEN NULL;
      END;

      DBMS_SCHEDULER.define_chain_step ( --create chain step to call program
        chain_name   => i.chain_name,
        step_name    => i.step_name,
        program_name => i.program_name);
    END IF;

    IF i.end_step = 'Y' THEN --if last step in chain
      DBMS_SCHEDULER.define_chain_rule ( -- create job chain end step
        chain_name => i.chain_name,
        condition  => i.condition,
        action     => 'END',
        rule_name  => i.step_name, 
        comments   => 'End of chain '||i.chain_name);
      DBMS_SCHEDULER.enable (i.chain_name); --enable the chain
      dbms_scheduler.create_job ( --create a job to execute the chain once
        job_name=> i.chain_name||l_job_suffix,
        job_type=> 'CHAIN',
        job_action=> i.chain_name,
        start_date=> sysdate,
        enabled=> FALSE);
    ELSE --otherwise create an ordinary job rule for each step
      DBMS_SCHEDULER.define_chain_rule (
        chain_name => i.chain_name,
        condition  => i.condition,
        action     => 'START "'||i.step_name||'"',
        rule_name  => i.step_name, 
        comments   => 'Sequence '||i.seq);
    END IF;

END LOOP;
END;
/

TEST_CHAIN_1, Step:CHAIN_STEP_10, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_10, Program:TEST_PROGRAM_10
TEST_CHAIN_1, Step:CHAIN_STEP_9, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_9, Program:TEST_PROGRAM_9
TEST_CHAIN_1, Step:CHAIN_STEP_8, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_8, Program:TEST_PROGRAM_8
TEST_CHAIN_1, Step:CHAIN_STEP_7, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_7, Program:TEST_PROGRAM_7
TEST_CHAIN_1, Step:CHAIN_STEP_6, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_6, Program:TEST_PROGRAM_6
TEST_CHAIN_1, Step:CHAIN_STEP_5, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_5, Program:TEST_PROGRAM_5
TEST_CHAIN_1, Step:CHAIN_STEP_4, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_4, Program:TEST_PROGRAM_4
TEST_CHAIN_1, Step:CHAIN_STEP_3, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_3, Program:TEST_PROGRAM_3
TEST_CHAIN_1, Step:CHAIN_STEP_2, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_2, Program:TEST_PROGRAM_2
TEST_CHAIN_1, Step:CHAIN_STEP_1, Condition:TRUE
TEST_CHAIN_1, Step:CHAIN_STEP_1, Program:TEST_PROGRAM_1
TEST_CHAIN_1, Step:CHAIN_STEP_LAST, Condition::CHAIN_STEP_1.state='SUCCEEDED' AND :CHAIN_STEP_10.state='SUCCEEDED' AND :CHAIN_STEP_2.state='SUCCEEDED' 
AND :CHAIN_STEP_3.state='SUCCEEDED' AND :CHAIN_STEP_4.state='SUCCEEDED' AND :CHAIN_STEP_5.state='SUCCEEDED' AND :CHAIN_STEP_6.state='SUCCEEDED' 
AND :CHAIN_STEP_7.state='SUCCEEDED' AND :CHAIN_STEP_8.state='SUCCEEDED' AND :CHAIN_STEP_9.state='SUCCEEDED'
TEST_CHAIN_1, Step:CHAIN_STEP_LAST, Program:TEST_PROGRAM_LAST
TEST_CHAIN_1, Step:CHAIN_STEP_END, Condition::CHAIN_STEP_LAST.state='SUCCEEDED'

PL/SQL procedure successfully completed.

Exploring the Chain

Various views are available to see how the chain is defined.
select * from all_scheduler_resources WHERE resource_name like 'TEST_CHAIN%'

                                                                                     Jobs
                                                                 Resource             Run
OWNER      RESOURCE_NAME                    STATUS                  Units UNITS_USED Count COMMENTS
---------- -------------------------------- -------------------- -------- ---------- ----- --------------------
SYSADM     TEST_CHAIN_1_RESOURCE            ENFORCE_CONSTRAINTS        10          0     0
SELECT owner,chain_name,rule_set_owner,rule_set_name,number_of_rules,number_of_steps,enabled,comments
FROM   all_scheduler_chains
WHERE chain_name like 'TEST_CHAIN%';

                                Rule Set
OWNER      CHAIN_NAME           Owner      RULE_SET_NAME   NUMBER_OF_RULES NUMBER_OF_STEPS ENABLED COMMENTS
---------- -------------------- ---------- --------------- --------------- --------------- ------- ----------------------------------------
SYSADM     TEST_CHAIN_1         SYSADM     SCHED_RULESET$7              12              11 TRUE
SELECT owner, program_name, program_type, program_action, number_of_arguments, enabled, priority, weight, has_Constraints, comments
FROM all_SCHEDULER_PROGRAMS
WHERE PROGRAM_NAME LIKE 'TEST_PROGRAM%';

                                                                  Num                  Has
OWNER      PROGRAM_NAME         PROGRAM_TYPE     PROGRAM_ACTION  Args ENABLED Prio Wgt Const. COMMENTS
---------- -------------------- ---------------- --------------- ---- ------- ---- --- ------ ------------------------------------------------------------
SYSADM     TEST_PROGRAM_10      STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       1   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_10
SYSADM     TEST_PROGRAM_9       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       1   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_9
SYSADM     TEST_PROGRAM_8       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       2   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_8
SYSADM     TEST_PROGRAM_7       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       2   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_7
SYSADM     TEST_PROGRAM_6       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       3   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_6
SYSADM     TEST_PROGRAM_5       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       3   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_5
SYSADM     TEST_PROGRAM_4       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       4   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_4
SYSADM     TEST_PROGRAM_3       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       4   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_3
SYSADM     TEST_PROGRAM_2       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       5   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_2
SYSADM     TEST_PROGRAM_1       STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       5   1 TRUE   Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_1
SYSADM     TEST_PROGRAM_LAST    STORED_PROCEDURE TEST_PROCEDURE     1 TRUE       3   1 FALSE  Program for chain:TEST_CHAIN_1, step:CHAIN_STEP_LAST
SELECT owner, chain_name, step_name, program_owner, program_name, step_type
FROM   all_scheduler_chain_steps
WHERE  chain_name like 'TEST_CHAIN%'
ORDER BY owner, chain_name, step_name;

                                                          Program
OWNER      CHAIN_NAME           STEP_NAME                 Owner      PROGRAM_NAME         STEP_TYPE
---------- -------------------- ------------------------- ---------- -------------------- ----------
SYSADM     TEST_CHAIN_1         CHAIN_STEP_1              SYSADM     TEST_PROGRAM_1       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_10             SYSADM     TEST_PROGRAM_10      PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_2              SYSADM     TEST_PROGRAM_2       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_3              SYSADM     TEST_PROGRAM_3       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_4              SYSADM     TEST_PROGRAM_4       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_5              SYSADM     TEST_PROGRAM_5       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_6              SYSADM     TEST_PROGRAM_6       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_7              SYSADM     TEST_PROGRAM_7       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_8              SYSADM     TEST_PROGRAM_8       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_9              SYSADM     TEST_PROGRAM_9       PROGRAM
SYSADM     TEST_CHAIN_1         CHAIN_STEP_LAST           SYSADM     TEST_PROGRAM_LAST    PROGRAM
SELECT owner,chain_name,rule_owner,rule_name,condition,action,comments
FROM   all_scheduler_chain_rules
WHERE chain_name like 'TEST_CHAIN%'
ORDER BY owner, chain_name, rule_owner, rule_name;

                         Rule
OWNER      CHAIN_NAME    Owner      RULE_NAME          CONDITION                                          ACTION                         COMMENTS
---------- ------------- ---------- ------------------ -------------------------------------------------- ------------------------------ ------------------------------
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_1       TRUE                                               START "CHAIN_STEP_1"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_10      TRUE                                               START "CHAIN_STEP_10"          Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_2       TRUE                                               START "CHAIN_STEP_2"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_3       TRUE                                               START "CHAIN_STEP_3"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_4       TRUE                                               START "CHAIN_STEP_4"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_5       TRUE                                               START "CHAIN_STEP_5"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_6       TRUE                                               START "CHAIN_STEP_6"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_7       TRUE                                               START "CHAIN_STEP_7"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_8       TRUE                                               START "CHAIN_STEP_8"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_9       TRUE                                               START "CHAIN_STEP_9"           Sequence 1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_END     :CHAIN_STEP_LAST.state='SUCCEEDED'                 END                            End of chain TEST_CHAIN_1
SYSADM     TEST_CHAIN_1  SYSADM     CHAIN_STEP_LAST    :CHAIN_STEP_1.state='SUCCEEDED' AND :CHAIN_STEP_10 START "CHAIN_STEP_LAST"        Sequence 2
                                                       .state='SUCCEEDED' AND :CHAIN_STEP_2.state='SUCCEE
                                                       DED' AND :CHAIN_STEP_3.state='SUCCEEDED' AND :CHAI
                                                       N_STEP_4.state='SUCCEEDED' AND :CHAIN_STEP_5.state
                                                       ='SUCCEEDED' AND :CHAIN_STEP_6.state='SUCCEEDED' A
                                                       ND :CHAIN_STEP_7.state='SUCCEEDED' AND :CHAIN_STEP
                                                       .state='SUCCEEDED' AND :CHAIN_STEP_9.state='SUCC
                                                       EEDED'

Executing the Chain 

Simply enable the job to execute the chain. The job created by this PL/SQL will only execute the chain once because by default it will automatically drop after it completes.
exec DBMS_SCHEDULER.enable ('test_chain_1_job');

Monitoring the Chain 

Oracle also provides views to monitor running jobs and chains.  ALL_SCHEDULER_RUNNING_CHAINS reports the current status of each step in the chain.
SELECT owner,job_name,chain_owner,chain_name,step_name,state 
FROM   all_scheduler_running_chains ORDER BY owner, job_name, chain_name, step_name;

                                    Chain
OWNER      JOB_NAME                 Owner      CHAIN_NAME           STEP_NAME                 STATE
---------- ------------------------ ---------- -------------------- ------------------------- ---------------
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_1              SUCCEEDED
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_10             RUNNING
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_2              SUCCEEDED
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_3              SUCCEEDED
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_4              SUCCEEDED
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_5              RUNNING
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_6              SUCCEEDED
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_7              RUNNING
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_8              RUNNING
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_9              SUCCEEDED
SYSADM     TEST_CHAIN_1_JOB         SYSADM     TEST_CHAIN_1         CHAIN_STEP_LAST           NOT_STARTED
You can also see each completed job and sub-job on ALL_SCHEDULER_RUN_DETAILS.
select log_id, job_name, job_subname, req_start_date, actual_start_date, log_date, run_duration, output
from all_scheduler_job_run_details 
where job_name like 'TEST_CHAIN%' 
AND log_date > sysdate-…
order by actual_start_date;

    LOG_ID JOB_NAME                 JOB_SUBNAME              REQ_START_DATE                 ACTUAL_START_DATE              LOG_DATE                       RUN_DURATION    OUTPUT
---------- ------------------------ ------------------------ ------------------------------ ------------------------------ ------------------------------ --------------- ----------------------------------------
   7942016 TEST_CHAIN_1_JOB                                  26/12/2023 15:54:15.720 +00:00 26/12/2023 15:54:18.970 +00:00 26/12/2023 11:03:05.901 -05:00 +000 00:08:47
   7941800 TEST_CHAIN_1_JOB         CHAIN_STEP_9             26/12/2023 15:54:19.279 +00:00 26/12/2023 15:54:19.494 +00:00 26/12/2023 10:55:49.599 -05:00 +000 00:01:30   TEST_PROCEDURE.CHAIN_STEP_9:90
   7941892 TEST_CHAIN_1_JOB         CHAIN_STEP_4             26/12/2023 15:54:19.633 +00:00 26/12/2023 15:56:00.613 +00:00 26/12/2023 10:56:40.671 -05:00 +000 00:00:40   TEST_PROCEDURE.CHAIN_STEP_4:40
   7941894 TEST_CHAIN_1_JOB         CHAIN_STEP_2             26/12/2023 15:54:19.651 +00:00 26/12/2023 15:56:00.615 +00:00 26/12/2023 10:56:20.832 -05:00 +000 00:00:20   TEST_PROCEDURE.CHAIN_STEP_2:20
   7941906 TEST_CHAIN_1_JOB         CHAIN_STEP_3             26/12/2023 15:54:19.639 +00:00 26/12/2023 15:56:05.946 +00:00 26/12/2023 10:56:36.206 -05:00 +000 00:00:30   TEST_PROCEDURE.CHAIN_STEP_3:30
   7941952 TEST_CHAIN_1_JOB         CHAIN_STEP_6             26/12/2023 15:54:19.620 +00:00 26/12/2023 15:56:36.438 +00:00 26/12/2023 10:57:36.608 -05:00 +000 00:01:00   TEST_PROCEDURE.CHAIN_STEP_6:60
   7941940 TEST_CHAIN_1_JOB         CHAIN_STEP_10            26/12/2023 15:54:19.261 +00:00 26/12/2023 15:57:37.626 +00:00 26/12/2023 10:59:17.691 -05:00 +000 00:01:40   TEST_PROCEDURE.CHAIN_STEP_10:100
   7942000 TEST_CHAIN_1_JOB         CHAIN_STEP_8             26/12/2023 15:54:19.388 +00:00 26/12/2023 15:59:18.696 +00:00 26/12/2023 11:00:38.840 -05:00 +000 00:01:20   TEST_PROCEDURE.CHAIN_STEP_8:80
   7942032 TEST_CHAIN_1_JOB         CHAIN_STEP_5             26/12/2023 15:54:19.628 +00:00 26/12/2023 16:00:48.614 +00:00 26/12/2023 11:01:38.783 -05:00 +000 00:00:50   TEST_PROCEDURE.CHAIN_STEP_5:50
   7942036 TEST_CHAIN_1_JOB         CHAIN_STEP_7             26/12/2023 15:54:19.500 +00:00 26/12/2023 16:01:44.374 +00:00 26/12/2023 11:02:54.432 -05:00 +000 00:01:10   TEST_PROCEDURE.CHAIN_STEP_7:70
   7942014 TEST_CHAIN_1_JOB         CHAIN_STEP_LAST          26/12/2023 16:02:54.564 +00:00 26/12/2023 16:02:59.714 +00:00 26/12/2023 11:03:00.729 -05:00 +000 00:00:01   TEST_PROCEDURE.CHAIN_STEP_LAST:1
The start and end time of each step are also recorded on the parameter table by TEST_PROCEDURE
select * from test_chain
where chain_name = 'TEST_CHAIN_1'
order by seq, resource_units;


SEQ CHAIN_NAME      STEP_NAME            PROGRAM_NAME      PROGRAM_ACTION Units Prio CONDITION                                          END Secs. BEGINDTTM                ENDDTTM
--- --------------- -------------------- ----------------- -------------- ----- ---- -------------------------------------------------- --- ----- ------------------------ ------------------------
  1 TEST_CHAIN_1    CHAIN_STEP_1         TEST_PROGRAM_1    TEST_PROCEDURE     1    5 TRUE                                               N      10 26/12/2023 10:54:19.737  26/12/2023 10:54:29.791
  1 TEST_CHAIN_1    CHAIN_STEP_2         TEST_PROGRAM_2    TEST_PROCEDURE     2    5 TRUE                                               N      20 26/12/2023 10:56:00.636  26/12/2023 10:56:20.828
  1 TEST_CHAIN_1    CHAIN_STEP_3         TEST_PROGRAM_3    TEST_PROCEDURE     3    4 TRUE                                               N      30 26/12/2023 10:56:05.960  26/12/2023 10:56:36.188
  1 TEST_CHAIN_1    CHAIN_STEP_4         TEST_PROGRAM_4    TEST_PROCEDURE     4    4 TRUE                                               N      40 26/12/2023 10:56:00.626  26/12/2023 10:56:40.667
  1 TEST_CHAIN_1    CHAIN_STEP_5         TEST_PROGRAM_5    TEST_PROCEDURE     5    3 TRUE                                               N      50 26/12/2023 11:00:48.621  26/12/2023 11:01:38.779
  1 TEST_CHAIN_1    CHAIN_STEP_6         TEST_PROGRAM_6    TEST_PROCEDURE     6    3 TRUE                                               N      60 26/12/2023 10:56:36.443  26/12/2023 10:57:36.604
  1 TEST_CHAIN_1    CHAIN_STEP_7         TEST_PROGRAM_7    TEST_PROCEDURE     7    2 TRUE                                               N      70 26/12/2023 11:01:44.378  26/12/2023 11:02:54.428
  1 TEST_CHAIN_1    CHAIN_STEP_8         TEST_PROGRAM_8    TEST_PROCEDURE     8    2 TRUE                                               N      80 26/12/2023 10:59:18.702  26/12/2023 11:00:38.837
  1 TEST_CHAIN_1    CHAIN_STEP_9         TEST_PROGRAM_9    TEST_PROCEDURE     9    1 TRUE                                               N      90 26/12/2023 10:54:19.546  26/12/2023 10:55:49.596
  1 TEST_CHAIN_1    CHAIN_STEP_10        TEST_PROGRAM_10   TEST_PROCEDURE    10    1 TRUE                                               N     100 26/12/2023 10:57:37.640  26/12/2023 10:59:17.687
  2 TEST_CHAIN_1    CHAIN_STEP_LAST      TEST_PROGRAM_LAST TEST_PROCEDURE          3 :CHAIN_STEP_1.state='SUCCEEDED' AND :CHAIN_STEP_10 N       1 26/12/2023 11:02:59.722  26/12/2023 11:03:00.725
                                                                                     .state='SUCCEEDED' AND :CHAIN_STEP_2.state='SUCCEE
                                                                                     DED' AND :CHAIN_STEP_3.state='SUCCEEDED' AND :CHAI
                                                                                     N_STEP_4.state='SUCCEEDED' AND :CHAIN_STEP_5.state
                                                                                     ='SUCCEEDED' AND :CHAIN_STEP_6.state='SUCCEEDED' A
                                                                                     ND :CHAIN_STEP_7.state='SUCCEEDED' AND :CHAIN_STEP
                                                                                     _8.state='SUCCEEDED' AND :CHAIN_STEP_9.state='SUCC
                                                                                     EEDED'

  3 TEST_CHAIN_1  CHAIN_STEP_END                                                   3 :CHAIN_STEP_LAST.state='SUCCEEDED'                 Y       1

Acknowledgments 

All of this can be worked out from the Oracle documentation, but I have found these pages very helpful:
  • Tim Hall's Oracle-Base.com
  •