Friday, September 13, 2024

Purging Standard and Unified Audit Data

Introduction

It is not uncommon for businesses to have policies to retain audit data for several years, but then it ought to follow that that data should be purged at the end of the retention period.  A regular purge process should be put in place, otherwise, audit data will accumulate.  However, it is all too common for the purge to be deferred, often indefinitely, until it becomes a problem.

I have been working on a system that has built up 12 years of audit data.  It was starting to consume significant space in the SYSAUX tablespace, and queries on the standard audit data had become slow.  We noticed that Oracle Enterprise Manager was regularly querying this data looking for recent failed logins.  This was performing so poorly that it had become a never-ending process.

This company has a stated requirement that audit data be retained for 7 years, but some can be purged after 2 years.  However, within the last 7 years, this production database has had different database IDs due to its upgrade history.  

Oracle delivers a PL/SQL package to manage audit purges.  However, this utility will only purge the current database ID, and it will only purge all of that data by a date.  It cannot selectively purge some data by other criteria.  Therefore, a custom purge utility that extends that package was required.  This post discusses that extension. 

We introduced

  • Regular weekly purge jobs for all forms of audit, for all DBIDs.  They run every weekend when the system is quieter.
  • This includes a custom purge job to delete some data after 2 years

General Recommendations

  • Standard audit is deprecated from Oracle 19c and will be desupported in a future release.  Therefore, Oracle recommends that customers migrate from standard audit to Unified Audit.  Oracle provides a Traditional to Unified Audit Syntax Converter - Generate Unified Audit Policies from Current Traditional Audit Configuration (Doc ID 2909718.1). Note, that this will only migrate the audit configuration.  The standard audit data that has built up will still need to be purged according to the retention policy until it is all gone.
  • Should any future database maintenance/upgrade activity cause the database ID to change, a new purge timestamp will be required for the new database ID, and an additional purge job will be required for the new legacy database ID.

Overview

Standard audit and Unified Audit (introduced in 12c) can be configured to monitor certain operations by database users.  Audit data is generally written to tables in the SYSAUX tablespace, but unified audit can be written to XML files outside the database.

  • Standard audit data is written to SYS.AUD$.  It is exposed via DBA_AUDIT_TRAIL.  Oracle does not support partitioning this table, and users who have tried have reported errors.
  • Unified audit data is written to AUDSYS.UNIFIED$AUD. It is exposed via UNIFIED_AUDIT_TRAIL. Unified Audit is enabled by default, so if you think you are only using standard audit, you are probably also writing some Unified Audit.  This table is interval range partitioned.  The default interval is calendar monthly, but can be altered.

Purging Audit

Audit Timestamps

Oracle's delivered audit management package DBMS_AUDIT_MGMT can purge various types of audit.  Each audit purge works on an individual database ID, the current database ID by default.  An audit purge timestamp is created for each type of audit, and for each DBID to be purged.  This is defined as a fixed date and not a retention period.  Audit data that is before the date is purged.  The audit timestamps must be updated before each purge to maintain the desired retention period.

The first step is to identify the DBIDs.  In my case, most of the data is standard audit.  I will simply query the number of rows and the range of dates for each DBID in the audit data.

select dbid, count(*), min(timestamp), max(timestamp)
from dba_audit_trail
group by dbid
order by 3
/

      DBID   COUNT(*) MIN(TIMESTAMP)      MAX(TIMESTAMP)
---------- ---------- ------------------- -------------------
1000000002   33606824 11/09/2012 00:06:42 20/08/2016 00:46:22
1000000003  241327475 20/08/2016 03:36:36 13/05/2023 04:47:20
1000000005   23320615 12/05/2023 23:54:18 06/01/2024 00:56:49
1000000006     134188 06/01/2024 01:05:29 04/03/2024 22:24:53
2000000008     282780 05/03/2024 03:39:39 02/05/2024 15:05:41

However, some unified audit is enabled by default, so both types of audit have to be purged.

select dbid, count(*), min(event_timestamp), max(event_timestamp)
from unified_audit_trail
group by dbid
order by 3
/

      DBID   COUNT(*) MIN(EVENT_TIMESTAMP MAX(EVENT_TIMESTAMP
---------- ---------- ------------------- -------------------
1000000004      28323 21/04/2023 10:09:39 21/04/2023 12:00:51
1000000005    1434181 12/05/2023 22:44:47 06/01/2024 00:56:49
1000000006    1700032 06/01/2024 01:05:29 04/03/2024 22:02:29
2000000008     263038 05/03/2024 03:12:27 07/05/2024 16:39:41

The following PL/SQL creates an audit timestamp for each database ID, for each type of audit.  The list of database IDs obtained above was then hard-coded in the following PL/SQL block.  Doing this dynamically can involve a long query on the audit data, so it is easier to hard code them.

If a database ID is not specified to DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP, a control record for the current database ID is created. The audit timestamp is set to 7 years ago because our policy is to purge any audit data older than 7 years.  Later, a database scheduler job will be created for each timestamp defined.

At the time of writing, it will be several years until the Unified Audit data is old enough to be purged, but if the purge jobs are not created now, nobody will remember to add them later!

declare
  k_standard_retention_years CONSTANT INTEGER NOT NULL := 7;
begin
  FOR i IN(
    with d (dbid) as (
          select TO_NUMBER(NULL) from dual
    union select 1000000001 from dual 
    union select 1000000002 from dual 
    union select 1000000003 from dual 
    union select 1000000004 from dual 
    union select 1000000005 from dual 
    union select 1000000006 from dual 
    )
    select dbid from d
  ) LOOP
    DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP
    (audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD
    ,last_archive_time => ADD_MONTHS(sysdate,-12*k_standard_retention_years)
    ,database_id => i.dbid
    );
  END LOOP;

  FOR i IN(
    with d (dbid) as (
          select TO_NUMBER(NULL) from dual
    union select 1000000004 from dual --2023 unified only
    union select 1000000005 from dual --2023-24 --current prod +unified
    union select 1000000006 from dual --2024 +unified
    --union select 2000000008 from dual --2024 +unified --current PRT
    )
    select dbid from d
  ) LOOP
    DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP
    (audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_UNIFIED
    ,last_archive_time => ADD_MONTHS(sysdate,-12*k_standard_retention_years)
    ,database_id => i.dbid
    );
  end loop;
end;
/

As old database IDs are completely purged their last archive timestamps can be removed from the audit purge configuration in the database with DBMS_AUDIT_MGMT.CLEAR_LAST_ARCHIVE_TIMESTAMP and the corresponding database jobs can be dropped.

Should any future database maintenance/migration/upgrade activity cause the database ID to change again, new timestamps must be set up for the new current database ID, and corresponding scheduler purge jobs must be created.

Custom Audit Purge Management Package

A custom package procedure XX_CUSTOM_AUDIT_PURGE has been created.  It is available on GitHub.  The package is deliberately created to be owned by the AUDSYS user that is used to run the default audit purge jobs.  That account is delivered by Oracle and is locked. It should not be unlocked.  Therefore, the package must be deployed by SYSDBA.  

This package contains various procedures

  • INIT_AUDIT_PURGE
    • initialises the standard and fine-grained audit purges,
    • creates a job resource constraint so the scheduler only runs one purge process at a time, and 
    • creates a job class to assign the job to a resource plan consumer group so that the purge job runs at low priority.
  • UPDATE_AUDIT_PURGE_TS updates the audit purge timestamps to 7 years ago, thus maintaining a rolling 7-year purge.
  • CUSTOM_PURGE performs a limited purge of some audit data after 2 years.  The exactly which audit actions are to be purged is hard-coded in this procedure.
  • CREATE_AUDIT_PURGE_JOBS uses DBMS_AUDIT_MGMT.CREATE_PURGE_JOB to create the various scheduler jobs to run the Oracle delivered purge procedure DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL for the various audits to be purged. 
    • However, this procedure then alters the jobs to control their scheduler, concurrency and resource manager consumer group.
  • CREATE_CUSTOM_PURGE_JOB creates a scheduler job to execute the CUSTOM_PURGE procedure.
  • CREATE_UPDATE_AUDIT_PURGE_TS_JOB creates a job on the database scheduler to run UPDATE_AUDIT_PURGE_TS.

Some constants and defaults have been hard-coded in the package including

  • The 2-year and 7-year purge rules.  
  • Resource consumer group of the purge jobs
  • Job Class for purge jobs to which resource consumer group is assigned.
  • Number of rows to be deleted between commits during custom delete (2-year purge).

The audit purge requirements are unlikely to change, and if they do a code change is required to implement the change that must be installed by the core DBA.  I felt that this approach was easier and more secure than creating and controlling access to metadata.

After an initial large purge to clear the backlog, we find that these jobs only run for a few minutes each.  By scheduling them to run regularly, we maintain a relatively steady data volume.

Reclaiming Space after Initial Large Purge

The audit purge is no more than a simple delete.  Once deleted (and committed) the space freed up can be used by other rows of audit data, but space is not released from the audit tables back to the SYSAUX tablespace.

The only supported way to recover space from the purged audit tables is to rebuild them by moving them to a different tablespace, and if necessary, moving them back using the DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOC procedure again.

Create a tablespace SYSAUX_AUD.  It should be created as a BIGFILE tablespace as SYSAUX is also a BIGFILE tablespace.  Otherwise, in our case, it would have needed at least 3 data files (so it could expand to 96G).

The following PL/SQL block creates a database scheduler job for each call to this procedure so that it can run in the background.

  • This time the jobs are one-time execution jobs that will be auto-dropped on completion.
  • The same resource constraint XX_PURGE_RESOURCE, as used by the other purge jobs, ensures that it will not run concurrently with any of those jobs (nor vice versa).
  • If the job start date is not set, so the job will run as soon as it is enabled and no other job using the same resource is running.

DECLARE
  TYPE t_aud_types IS TABLE OF VARCHAR2(8);
  a_aud_types t_aud_types;
  l_job_name VARCHAR2(30);
  l_tablespace_name VARCHAR2(30) := 'SYSAUX_AUD';
  e_job_does_not_exist EXCEPTION;
  PRAGMA EXCEPTION_INIT(e_job_does_not_exist, -27475);
BEGIN
  a_aud_types := t_aud_types('AUD_STD','UNIFIED') /*FGA_STD deliberately excluded*/;
  FOR i IN a_aud_types.FIRST..a_aud_types.LAST
  LOOP
    l_job_name := 'XX_AUDIT_TRAIL_LOC_'||a_aud_types(i);
    BEGIN
      dbms_scheduler.drop_job(job_name => l_job_name);
    EXCEPTION WHEN e_job_does_not_exist THEN NULL;
    END;
    dbms_scheduler.create_job
    (job_name => l_job_name
    ,job_type => 'PLSQL_BLOCK'
    ,job_action => 'BEGIN dbms_audit_mgmt.set_audit_trail_location(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_'||a_aud_types(i)||',audit_trail_location_value => '''||l_tablespace_name||''');  END;'
    ,start_date => SYSTIMESTAMP at time zone ('US/Eastern') 
    ,job_class => 'XX_PURGE_CLASS'
    ,auto_drop => TRUE
    ,enabled => FALSE);
    DBMS_SCHEDULER.set_resource_constraint --Only one purge job can run at any one time
    (object_name   => l_job_name
    ,resource_name => 'AUDSYS.XX_PURGE_RESOURCE'
    ,units         => 1);     
    dbms_scheduler.enable(l_job_name);
  END LOOP;     
END;
/

  • NB: I specified the time zone for the database job because I am working in a different time zone to the database!

We have found that due to a bug, the unified audit table is not relocated to the new tablespace.  It is expected that this will be fixed in version 23ai. 

Validate/Reinstate Privileges

Using SET_AUDIT_TRAIL_LOCATION will strip off any explicitly granted privileges on the audit tables. Ensure privileges on the standard audit tables are intact.  Otherwise, the custom purge will fail.

column grantee format a8
column owner format a8
column table_name format a20
column privilege format a20
column grantor format a8
select * from dba_tab_privs
where (owner,table_name) IN(('SYS','FGA_LOG$'),('SYS','AUD$'),('AUDSYS','AUD$UNIFIED'))
/

Minimum expected output

GRANTEE  OWNER    TABLE_NAME           GRANTOR  PRIVILEGE            GRA HIE COM TYPE                     INH
-------- -------- -------------------- -------- -------------------- --- --- --- ------------------------ ---
AUDSYS   SYS      AUD$                 SYS      DELETE               NO  NO  NO  TABLE                    NO 
AUDSYS   SYS      AUD$                 SYS      SELECT               NO  NO  NO  TABLE                    NO

TL;DR

If you collect audit data, then you should create a retention policy and schedule purge jobs to delete it after the retention period expires.  It is not unreasonable to keep audit data for several years.  The database ID may change in that time due to upgrade or migration.  Purge processes are also required for each legacy DBID.

After an initial large purge reclaims space by relocating the Standard Audit table, but remember to reinstate any explicitly granted privileges.  

Unified Audit cannot be relocated until a bug is resolved in 23ai.  

The performance of purge and query of standard audit will improve after doing this.

Thursday, April 11, 2024

Configuring Shared Global Area (SGA) in a Multitenant Database

I have been working on a PeopleSoft Financials application that we have converted from a stand-alone database to be the only pluggable database (PDB) in an Oracle 19c container database (CDB).  We have been getting ORA-4031 (unable to allocate shared memory) errors in the PeopleSoft application.  

It has taken a while to solve and test, and I have to acknowledge quite a lot of advice from my friends.  

If you are wondering why you should be involved with your local Oracle user group, and regularly attend their meetings, this is an example: So you can ask people who have experience of different systems in different situations that you haven't encountered yet!

Documentation

I am going to look at 6 initialisation parameters that control the use of SGA.  The Oracle documentation, even in 21c, suggests they can mostly be set at CDB and PDB levels.  However, more recent Oracle guidance confirmed by my own experience suggests that is not a good idea.

  • SGA_MAX_SIZE can only be set at CDB level.  It sets the size of the shared memory segment that is the SGA.  It cannot be changed during the life of the database instance.  
    • Recommendation:
      • It can be useful to set it higher than SGA_TARGET if you plan either to increase SGA_TARGET, or add PDBs to the CDB, without restarting the instances.
  • SGA_TARGET "specifies the total size of all SGA components".  Use this parameter to control the memory usage of each PDB.  The setting at CDB must be at least the sum of the settings for each PDB.
    • Recommendations:
      • Use only this parameter at PDB level to manage the memory consumption of the PDB.
      • In a CDB with only a single PDB, set SGA_TARGET to the same value at CDB and PDB levels.  
      • Therefore, where there are multiple PDBs, SGA_TARGET at CDB level should be set to the sum of the settings for each PDB.  However, I haven't tested this yet.
      • There is no recommendation to reserve SGA for use by the CDB only, nor in my experience is there any need so to do.
  • SHARED_POOL_SIZE sets the minimum amount of shared memory reserved to the shared pool.  It can optionally be set in a PDB.  
    • Recommendation: However, do not set SHARED_POOL_SIZE at PDB level.  It can be set at CDB level.
  • DB_CACHE_SIZE sets the minimum amount of shared memory reserved to the buffer cache. It can optionally be set in a PDB.  
    • Recommendation: However, do not set DB_CACHE_SIZE at PDB level.  It can be set at CDB level.
  • SGA_MIN_SIZE has no effect at CDB level.  It can be set at PDB level at up to half of the manageable SGA
    • Recommendation: However, do not set SGA_MIN_SIZE.
  • INMEMORY_SIZE: If you are using in-memory query, this must be set at CDB level in order to reserve memory for the in-memory store.  The parameter defaults to 0, in which case in-memory query is not available.  The in-memory pool is not managed by Automatic Shared Memory Management (ASMM), but it does count toward the total SGA used in SGA_TARGET.
    • Recommendation: Therefore it must also be set in the PDB where in-memory is being used, otherwise we found (contrary to the documentation) that the parameter defaults to 0, and in-memory query will be disabled in that PDB.

Oracle Notes

There are a lot of Oracle support notes on the subject SGA management in a multi-tenant database.  The older nodes talk about setting memory parameters in the PDB, and a later note and a bug advises only setting these parameters at CDB level, and not at all in the PDB.

  • About memory configuration parameter on each PDBs (Doc ID 2655314.1)November 2023
    • As a best practice, please do not to set SHARED_POOL_SIZE and DB_CACHE_SIZE on each PDBs and please manage automatically by setting SGA_TARGET.
    • "This best practice is confirmed by development in Bug 30692720"
    • Bug 30692720 discusses how the parameters are validated.  Eg. "Sum(PDB sga size) > CDB sga size"
    • Bug 34079542: "Unset sga_min_size parameter in PDB."

SGA Management with a Parse Intensive System (PeopleSoft).

PeopleSoft systems dynamically generate lots of non-shareable SQL code.  This leads to lots of parse and consumes more shared pool.  ASMM can respond by shrinking the buffer cache and growing the shared pool.  However, this can lead to more physical I/O and degrade performance and it is not beneficial for the database to cache dynamic SQL statements that are not going to be executed again.  Other parse-intensive systems can also exhibit this behaviour.

In PeopleSoft, I normally set DB_CACHE_SIZE and SHARED_POOL_SIZE to minimum values to stop ASMM shuffling too far in either direction.  With a large SGA, moving memory between these pools can become a performance problem in its own right.  

We removed SHARED_POOL_SIZE, DB_CACHE_SIZE and SGA_MIN_SIZE settings from the PDB.  The only SGA parameters set at PDB level are SGA_TARGET and INMEMORY_SIZE.  We have found it is safe to reduce the setting of SGA_TARGET at PDB level, but reducing at CDB level without also restarting the instance has caused problems.

SHARED_POOL_SIZE and DB_CACHE_SIZE are set as I usually would for PeopleSoft, but only at CDB level to guarantee a minimum buffer cache size.  

This is straightforward when there is only one PDB in the CDB.   I have yet to see what happens when I have another active PDB with a non-PeopleSoft system and a different kind of workload that puts less stress on the shared pool and more on the buffer cache.

TL;DR

  • Do not set any SGA parameter in a PDB other than SGA_TARGET and (if necessary) INMEMORY_SIZE.
  • Do not set DB_CACHE_SIZE, SHARED_POOL_SIZE at PDB level.  They can be set at CDB level. 
  • Do not set SGA_MIN_SIZE at either PDB or CDB level.


Thursday, February 22, 2024

Table Clusters: 6. Testing the Cluster & Conclusion (TL;DR)

This post is the last part of a series that discusses table clustering in Oracle.

  1. Introduction and Ancient History
  2. Cluster & Cluster Key Design Considerations
  3. Populating the Cluster with DBMS_PARALLEL_EXECUTE
  4. Checking the Cluster Key
  5. Using the Cluster Key Index instead of the Primary/Unique Key Index
  6. Testing the Cluster & Conclusion (TL;DR)

Testing

We did get improved performance with the clustered tables.  More significantly, we encountered less inter-process contention, and so were able to run more concurrent processes, and the overall elapsed time of all the processes was reduced.

Looking at just the performance of the bulk delete statements on the result tables, there is a significant reduction in DB time and physical I/O time on the clustered tables.  The reduction in physical I/O is not only because the table is smaller, but because there is no need to perform consistent read recovery on the blocks, there are fewer reads from the undo segment and less CPU was consumed creating consistent read copies in the buffer cache.

Statement Heap Table Clustered Table
DELETE FROM PS_GP_RSLT_ACUM…
DB Time (s)

2182

1662

delete statement only db file sequential

1451

891

CPU

941

531


Statement Heap Table Clustered Table
DELETE FROM PS_GP_RSLT_ABS…
DB Time (s)

781

330

delete statement only db file sequential

340

210

CPU

300

120

GP_RSLT_PIN is another, albeit smaller, result table.  It is a candidate for clustering, however, it was not clustered for this test and therefore did not show any significant improvement.  It was subsequently clustered.

Statement

Heap Table Heap in Cluster Test
DELETE FROM PS_GP_RSLT_PIN…
DB Time (s)

270

250

delete statement only db file sequential

110

120

CPU

110

90

The execution plans for some queries on clustered tables changed to use the cluster key index which resulted in poorer performance.  I had to introduce some SQL profiles to reinstate the original execution plans.  
However, the execution plans for these delete statements also switched to the cluster key index resulting in improved performance.  So it depends.

Conclusion (TL;DR)

Table partitioning can help you find data efficiently by allowing the database to eliminate partitions that cannot contain the data.  However, you must be running Enterprise Edition and license the partitioning option.
Table clustering is effective when you are regularly querying data from multiple tables with similar keys, and you can store them in the same data blocks, thus saving the overhead of retrieving multiple blocks.  It is available on any Oracle database and does not require any additional licence.
Both partitioning and clustering can help avoid the overhead of read consistency by storing dissimilar data in different blocks.
Sometimes, using the cluster key index can result in worse performance than using the original indexes.  A SQL profile or SQL baseline may be needed to stabilise some execution plans.

Monday, February 19, 2024

Table Clusters: 5. Using the Cluster Key Index instead of the Primary/Unique Key Index

This post is part of a series that discusses table clustering in Oracle.

  1. Introduction and Ancient History
  2. Cluster & Cluster Key Design Considerations
  3. Populating the Cluster with DBMS_PARALLEL_EXECUTE
  4. Checking the Cluster Key
  5. Using the Cluster Key Index instead of the Primary/Unique Key Index
  6. Testing the Cluster & Conclusion (TL;DR)

In my test case, the cluster key index is made up of the first 7 columns of the unique key index.  One side-effect of this similarity of the keys is that the optimizer may choose to use the cluster key index where previously it used the unique index.  

The cluster key index is a unique index.  It contains only one entry for each distinct cluster key value that points to the first block that contains rows with those cluster key values.  As we saw in the previous post, there are many rows in the table for each distinct cluster key.  Therefore, the cluster key index is much smaller than the unique index on any table in the cluster.  This contributes to making it appear cheaper to access.

The clustering factor is fundamental to determining the cost of using an index.  It is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order.  Notwithstanding that blocks may be cached, every time the scan changes to a different data block in the table, that is another I/O.  

In my case, the clustering factor of the cluster key index is also the same value as the number of rows and the number of distinct keys.  This is because I have set the cluster size equal to the block size so that each cluster key value points to a different block, and each block only contains rows for a single cluster key value.  The clustering factor of the cluster key index is much lower than that of the unique indexes, also making it look cheaper to access.
TABLE_NAME           INDEX_NAME               UNIQUENES PREFIX_LENGTH LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
-------------------- ------------------------ --------- ------------- ----------- ------------- ---------- -----------------
PS_GP_RSLT_CLUSTER   PS_GP_RSLT_CLUSTER_IDX   UNIQUE                       111541       8875383    8875383           8875383
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS           UNIQUE                8     1271559     152019130  152019130          10806251
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM          UNIQUE                8     8421658     762210387  762210387         101166426
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN           UNIQUE                9     3894799     327189471  327189471          31774871

I still need to create the unique index on the tables to enforce uniqueness. I have found that the optimizer tends to choose the cluster key index in preference to the unique index. The cost of accessing cluster key index is lower because it is smaller, and has a lower clustering factor. When I increased the length of the cluster key from 3 to 7 columns I also found that the size and clustering factor of the cluster key index increased, and the clustering factor for the unique indexes decreased, partly because the rows are less disordered with respect to the index key, and partly because the size of the table decreased because each cluster key is only stored one. Although this reduced the cost of accessing the unique indexes, I still find the optimizer tends to choose the cluster key index over the unique index.

Sometimes, the switch to the cluster key index is beneficial, but sometimes performance degrades as in the case of this query.
SELECT … 
FROM PS_GP_RSLT_ACUM RA ,PS_GP_ACCUMULATOR A ,PS_GP_PYE_HIST_WRK H 
WHERE H.EMPLID BETWEEN :1 AND :2 AND H.CAL_RUN_ID=:3 
AND H.RUN_CNTL_ID=:4 AND H.OPRID=:5 
AND H.EMPLID=RA.EMPLID 
AND H.EMPL_RCD=RA.EMPL_RCD 
AND H.GP_PAYGROUP=RA.GP_PAYGROUP 
AND H.CAL_ID=RA.CAL_ID 
AND H.ORIG_CAL_RUN_ID=RA.ORIG_CAL_RUN_ID 
AND H.HIST_CAL_RUN_ID=RA.CAL_RUN_ID 
AND H.RSLT_SEG_NUM=RA.RSLT_SEG_NUM 
AND RA.PIN_NUM=A.PIN_NUM 
AND RA.ACM_PRD_OPTN<>'1' 
AND(H.CALC_TYPE=A.CALC_TYPE OR H.HIST_TYPE= 'G') 
ORDER BY RA.EMPLID,H.PRC_ORD_TS,RA.EMPL_RCD,RA.PIN_NUM
PS_GP_PYE_HIST_WRK is equi-joined to PS_GP_RSLT_ACUM by all 7 cluster key columns, so the cluster key index can satisfy this join.  The plan has switched to using the cluster key index.

Plan hash value: 4007126853
 
-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                         |       |       |  2369 (100)|          |
|   1 |  SORT ORDER BY                          |                         |   133 | 36841 |  2369   (1)| 00:00:01 |
|*  2 |   FILTER                                |                         |       |       |            |          |
|*  3 |    HASH JOIN                            |                         |   133 | 36841 |  2368   (1)| 00:00:01 |
|   4 |     NESTED LOOPS                        |                         |   393 |   103K|  2348   (1)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| PS_GP_PYE_HIST_WRK      |  1164 |   156K|    12   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | PS_GP_PYE_HIST_WRK      |     1 |       |    11   (0)| 00:00:01 |
|*  7 |      TABLE ACCESS CLUSTER               | PS_GP_RSLT_ACUM         |     1 |   132 |     3   (0)| 00:00:01 |
|*  8 |       INDEX UNIQUE SCAN                 | PS_GP_RSLT_CLUSTER_IDX  |     1 |       |     1   (0)| 00:00:01 |
|   9 |     INDEX FAST FULL SCAN                | PSBGP_ACCUMULATOR       |  9208 | 64456 |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------
The profile of the ASH data by plan line ID shows that most of the time is spent on physical I/O on line 7 of the plan, physically scanning the blocks in the cluster for each cluster key
   SQL Plan         SQL Plan                                               H   P E        ASH
 Hash Value          Line ID EVENT                                         P     x       Secs
----------- ---------------- --------------------------------------------- --- - --- --------
 4007126853                7 db file sequential read                       N   N Y        120
 4007126853                  db file sequential read                       N   N Y         80
I can force the plan back to using the unique index on PS_GP_RSLT_ACUM with a hint, SQL Profile, SQL Patch, or SQL Plan Baseline, and there is a reduction in database response time.
NB: You cannot make a cluster key index invisible.
Plan hash value: 1843812660
 
------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                 | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                          |                    |       |       |   845 (100)|          |
|     1 |  SORT ORDER BY                            |                    |     1 |   277 |   845   (1)| 00:00:01 |
|  *  2 |   FILTER                                  |                    |       |       |            |          |
|  *  3 |    HASH JOIN                              |                    |     1 |   277 |   844   (1)| 00:00:01 |
|-    4 |     NESTED LOOPS                          |                    |     1 |   277 |   844   (1)| 00:00:01 |
|-    5 |      STATISTICS COLLECTOR                 |                    |       |       |            |          |
|     6 |       NESTED LOOPS                        |                    |     1 |   270 |   843   (1)| 00:00:01 |
|     7 |        TABLE ACCESS BY INDEX ROWID        | PS_GP_PYE_HIST_WRK |   416 | 57408 |     6   (0)| 00:00:01 |
|  *  8 |         INDEX RANGE SCAN                  | PS_GP_PYE_HIST_WRK |     1 |       |     5   (0)| 00:00:01 |
|  *  9 |        TABLE ACCESS BY INDEX ROWID BATCHED| PS_GP_RSLT_ACUM    |     1 |   132 |     5   (0)| 00:00:01 |
|  * 10 |         INDEX RANGE SCAN                  | PS_GP_RSLT_ACUM    |     1 |       |     4   (0)| 00:00:01 |
|- * 11 |      INDEX RANGE SCAN                     | PSBGP_ACCUMULATOR  |     1 |     7 |     1   (0)| 00:00:01 |
|    12 |     INDEX FAST FULL SCAN                  | PSBGP_ACCUMULATOR  |     1 |     7 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

   SQL Plan         SQL Plan                                               H   P E        ASH
 Hash Value          Line ID EVENT                                         P     x       Secs
----------- ---------------- --------------------------------------------- --- - --- --------
 1843812660               10 db file sequential read                       N   N Y         70
 1843812660                9 db file sequential read                       N   N Y         60
 1843812660                  CPU+CPU Wait                                  N   N Y         50

Table Cached Blocks 

The table_cached_blocks statistics preference specifies the average number of blocks assumed to be cached in the buffer cache when calculating the index clustering factor. When DBMS_STATS calculates the clustering factor of an index it does not count visits to table blocks assumed to be cached because they were in the last n distinct table blocks visit, where n is the value to which table_cached_blocks is set.

We have already seen that with 7 cluster key columns, no more than 7 blocks are required to hold any one cluster key.  If I set table cached blocks to at least 7, then when Oracle scans the table blocks in unique key order (which matches the cluster key order for the first 7 columns) it does not count additional visits to blocks for the same cluster key. Thus we see a reduction in the clustering factor on the unique index. There is no advantage to a higher value of this setting. We do not see a significant reduction in the clustering factor on other indexes with different leading columns. 

TCB=1
TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130          10806251 1          12-01-24 15:33:02
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387         101166426 1          12-01-24 15:37:55
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          31774872 1          12-01-24 15:39:00
TCB=8
 TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130           8217000 1          12-01-24 15:05:42
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387          16658798 1          12-01-24 15:10:40
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          11321888 1          12-01-24 15:01:37

TCB=16

TABLE_NAME           INDEX_NAME               PREFIX_LENGTH LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR DEGREE     LAST_ANALYZED    
-------------------- ------------------------ ------------- ----------- ---------- ----------------- ---------- -----------------
PS_GP_RSLT_ABS       PS_GP_RSLT_ABS                       8     1271559  152019130           8217000 1          12-01-24 15:44:25
PS_GP_RSLT_ACUM      PS_GP_RSLT_ACUM                      8     8421658  762210387          16658710 1          12-01-24 15:49:29
PS_GP_RSLT_PIN       PS_GP_RSLT_PIN                       9     3894799  327189471          11321888 1          12-01-24 15:50:36

The reduction in the clustering factor can mitigate the optimizer's tendency to use the cluster key index, but it may still occur.

NB: table_cached_blocks applies only when gathering statistics with DBMS_STATS, and not to CREATE INDEX or REBUILD INDEX operations that use the default value of 1.  This is not a bug, it is in the DBMS_STATS documentation.  

See also

TL;DR

The statistics on the cluster key index may lead the optimizer to determine the cost of using it is lower than the unique index.  The switch from the unique/primary key index to the cluster key index may result in poorer performance.  Setting Table Cached Blocks on the tables in the cluster may help.  However, you may still need to use SQL Profiles/SQL Plan Baselines/SQL Patches to force the optimizer to continue to use the unique indexes.

Friday, February 16, 2024

Table Clusters: 4. Checking the Cluster Key

This post is part of a series that discusses table clustering in Oracle.
  1. Introduction and Ancient History
  2. Cluster & Cluster Key Design Considerations
  3. Populating the Cluster with DBMS_PARALLEL_EXECUTE
  4. Checking the Cluster Key
  5. Using the Cluster Key Index instead of the Primary/Unique Key Index
  6. Testing the Cluster & Conclusion (TL;DR)

This query calculates the frequency of each number of distinct blocks per cluster key.  It uses DBMS_ROWID to get the block number from the ROWID.  The query counts the number of distinct blocks per cluster key, and the number of times that number of blocks per key occurs.

with x as ( --cluster key and rowid of each row
  select emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
  ,      DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_no from ps_gp_rslt_abs
), y as ( --count number of rows per cluster key and block number
  select /*+MATERIALIZE*/ emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
  ,      block_no, count(*) num_rows 
  from   x   
  group by emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM, block_no
), z as ( --count number of blocks and rows per cluster key
  select /*+MATERIALIZE*/ emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
  ,      count(distinct block_no) num_blocks, sum(num_rows) num_rows 
  from   y
  group by emplid, cal_run_id, empl_rcd, gp_paygroup, cal_id, ORIG_CAL_RUN_ID, RSLT_SEG_NUM
)
select num_blocks, count(distinct emplid) emplids
,      sum(num_rows) sum_rows
,      median(num_rows) median_rows
,      median(num_rows)/num_blocks median_rows_per_block
from   z
group by num_blocks
order by num_blocks
/

The answer you get depends on the data, so your mileage will vary.  

Initially, I built the cluster with 3 columns in the key.  In my case, 81% of rows were organised such that they have no more than 2 data blocks per cluster key.   

NUM_BLOCKS    EMPLIDS   SUM_ROWS MEDIAN_ROWS MEDIAN_ROWS_PER_BLOCK
---------- ---------- ---------- ----------- ---------------------
         1      69638   46809975          12                    12
         2      47629   78370682          34                    17
         3      12120   14330976          68            22.6666667
         4       4598    4395844          94                  23.5
         5       2376    6941389         124                  24.8
         6        652    2510790         155            25.8333333
         7         27      34527         185            26.4285714
         8         14      12330         217                27.125
         9          9      40633         248            27.5555556
        10          1      14607         279                  27.9
        11          1        310         310            28.1818182
        12          2       2212         310            25.8333333
        13          1       1476         372            28.6153846
        14          1        372         372            26.5714286
I rebuilt the cluster with 7 key columns.  Now no cluster key has more than 7 blocks, most of the keys are in a single block, and 85% are in no more than 2.  Increasing the length of the cluster key also resulted in the table being smaller because each cluster key is only stored once.
NUM_BLOCKS    EMPLIDS   SUM_ROWS MEDIAN_ROWS    MEDIAN_ROWS_PER_BLOCK
---------- ---------- ---------- -----------  ---------------------
         1      74545   71067239          14                    14
         2      52943   57481538          40                    20
         3      13553   11185685          73            24.3333333
         4       4567    8949787         120                    30
         5       1327    3251707         150                    30
         6        144      81977         160            26.6666667
         7          3       1197       204.5            29.2142857
There is now only a small number of employees whose data is spread across many cluster blocks.  They might be slower to access, but I think I have a reasonable balance.