Tuesday, July 01, 2025

Deadlock within DML statements

Oracle maintain a very detailed note Troubleshooting "ORA-00060 Deadlock Detected" Errors (Doc ID 62365.1).  

This blog demonstrates that it is also possible to produce a deadlock with a single DML statement that updates multiple rows in a different order than another, potentially identical, statement.  

What is a Deadlock?

Oracle’s note explains that “A deadlock occurs when a session (A) wants a resource held by another session (B), but that session also wants a resource held by the first session (A). There can be more than 2 sessions involved, but the idea is the same.

The key point is that two or more sessions demand the same resources in a different order.

It is not a Database Error

NOTE: Deadlock errors are usually not the underlying root cause of a problem, rather they are more likely to be an indicator of another issue in the application or elsewhere. Once the resultant trace file has been examined … to determine the objects involved, it is then worth thinking about what could be causing such a deadlock - for example a batch program being run more than once by mistake or in the wrong order, or by not following programming standards in an application.

Identification and Resolution of the underlying issue then makes the error redundant.

Diagnostic information produced by a Deadlock

"ORA-00060 error normally writes the error message in the alert.log, together with the name of the trace file created. The exact format of this varies between Oracle releases. The trace file will be written to the directory indicated by the USER_DUMP_DEST or BACKGROUND_DUMP_DEST, depending on the type of process that creates the trace file.

The trace file will contain a deadlock graph and additional information similar to that shown below. This is the trace output from the above example, which signalled an ORA-00060..."

 The trace file always contains this reminder:

DEADLOCK DETECTED ( ORA-00060 ) [Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock"

This is followed by a deadlock graph that shows the sessions and database locks involved, and hence the object being locked.   The SQL statements involved are also in the trace.

Deadlock graph:

                      ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00050018-000004fa        22     132     X             19     191           X
TX-00070008-00000461        19     191     X             22     132           X

session 132: DID 0001-0016-00000005     session 191: DID 0001-0013-0000000C
session 191: DID 0001-0013-0000000C     session 132: DID 0001-0016-00000005

Set Processing SQL Demonstration

This is a simplified example of producing a deadlock using the same SQL statement, executing in different sessions with different indexes, and therefore updating the rows in a different order.

Setup

I have created a PL/SQL function that calls DBMS_SESSION.SLEEP to pause the session for a specified number of seconds before returning the current system timestamp.  The purpose of this is simply to slow down the update so that it is easier to demonstrate the deadlock without creating a much larger table.
CREATE OR REPLACE FUNCTION my_slow(p_seconds NUMBER) RETURN timestamp IS 
BEGIN
  dbms_session.sleep(p_seconds);
  RETURN systimestamp; 
END;
/
I create a small table with just 10 rows and three columns.  
  • Column A has a sequence of integer values, 1 to 10.  This column is the primary key.
  • Column B has random numerical values in the range 1 to 100, but when the rows are sorted by this column, they will come out in a different order.  This column is the subject of another index.
  • Column C is a timestamp that will be updated with the system timestamp during the test so that we can see what order the rows are updated in. 
CREATE TABLE t 
(a number, b number, c timestamp
,CONSTRAINT t_pk PRIMARY KEY (a))
/
CREATE INDEX t_b ON t (b)
/
INSERT INTO t (a, b)
SELECT level, dbms_random.value(1,100) FROM DUAL 
CONNECT BY level <= 10
/
COMMIT;
EXEC dbms_stats.gather_table_stats(user,'T');

Statement 1

I will update the table T in each of the two sessions with an almost identical update statement.  The only difference between the statements is the column referenced in the where clause, and it is that which dictates the index used.
UPDATE t
SET c = my_slow(1)
WHERE a > 0
/
If I run each of these statements in isolation, they are successful.  I can obtain the execution plan and see when the rows were updated.  The statement updates at the rate of 1 row per second, because the MY_SLOW() function includes a 1-second pause.
In the first statement, the statement uses the primary key index on column A, and we can see that the rows are updated in the order of values in column A because Oracle has range-scanned the primary key index T_PK.
A full scan would have produced the same order in this test, but I want to emphasise that the problem occurs with a change of index.

Plan hash value: 2174628095

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |      |    10 |    40 |     2   (0)| 00:00:01 |
|   1 |  UPDATE           | T    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| T_PK |    10 |    40 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

SELECT * FROM t ORDER BY c;

         A          B C                  
---------- ---------- -------------------
         1 43.6037759 25/06/2025 17:08:03
         2 75.8443964 25/06/2025 17:08:04
         3 32.7061872 25/06/2025 17:08:05
         4 92.3717375 25/06/2025 17:08:07
         5 99.2611075 25/06/2025 17:08:08
         6 18.9198972 25/06/2025 17:08:09
         7 21.8558534 25/06/2025 17:08:10
         8 15.9224485 25/06/2025 17:08:11
         9 94.3695186 25/06/2025 17:08:12
        10 38.7300478 25/06/2025 17:08:13

Statement 2

In the second statement, the where clause has a condition on column B.
UPDATE t
SET c = my_slow(1)
WHERE b > 0
/
Now the update statement range scans index T_B on column B, and we can see that the rows were updated in the order of the values in column B.

Plan hash value: 2569189006

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |      |    10 |   230 |     2   (0)| 00:00:01 |
|   1 |  UPDATE           | T    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| T_B  |    10 |   230 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

SELECT * FROM t ORDER BY c;
A B C ---------- ---------- ------------------- 8 15.9224485 25/06/2025 17:08:15 6 18.9198972 25/06/2025 17:08:16 7 21.8558534 25/06/2025 17:08:17 3 32.7061872 25/06/2025 17:08:18 10 38.7300478 25/06/2025 17:08:19 1 43.6037759 25/06/2025 17:08:20 2 75.8443964 25/06/2025 17:08:21 4 92.3717375 25/06/2025 17:08:22 9 94.3695186 25/06/2025 17:08:23 5 99.2611075 25/06/2025 17:08:24

Deadlock

If I run the two update statements simultaneously in different sessions, then one of them succeeds, and the other fails with a deadlock error.  
UPDATE t
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
After numerous tests, it appears to be arbitrary which statement succeeds and which fails.  It is just a matter of which statement gets slightly ahead and which detects the deadlock first.

QED

Thus, it is possible to produce a deadlock solely through SQL set-based processing, without any procedural code.  Two similar DML statements differing only in the index they use, and therefore the order in which they process rows, produced a deadlock.  
Like other deadlocks, it is the difference in the order of processing that is the root cause of all deadlocks.

Monday, May 05, 2025

Configuring SQL Developer to Authenticate Via Kerberos

Kerberos is a trusted third-party authentication system that relies on shared secrets and presumes that the third party is secure (see Oracle 19 Security Guide  Configuring Kerberos Authentication).  The Oracle client can be configured to use Kerberos. SQL Developer can authenticate with Kerberos using the Oracle client.  Various Kerberos parameters are specified in sqlnet.ora.  Two parameters must be copied to the SQL Developer configuration so that thin connections can authenticate using Kerberos.

  • SQLNET.KERBEROS5_CC_NAME: the complete path name to the Kerberos credentials cache (CC) file.
  • SQLNET.KERBEROS5_CONF: the complete path name to the Kerberos configuration file, which contains the realm for the default Key Distribution Center (KDC) and maps realms to KDC hosts.  The default location on Windows is c:\krb5\krb.conf.

This parameter may also be set

  • SQLNET.KERBEROS5_CONF_LOCATION: the directory for the Kerberos configuration file. This parameter also specifies that the file is created by the system, and not by the client.
…
SQLNET.KERBEROS5_CONF=C:\oracle\19.3.0_32\network\admin\krb5.conf
SQLNET.KERBEROS5_CC_NAME=C:\oracle\19.3.0_32\network\admin\cache
…

The SQL Developer configuration is at Tools ➤ Preferences  Database ➤ Advanced.

It can make a 'thick' connection via the SQL*Net Client.  Its location can be specified. Within the configuration screen, that location can also be verified.  The location of the tnsnames.ora, if not in the default, can be specified..

However, you can still make thin connections authenticated by Kerberos.  The locations of the Kerberos configuration file, and cache directory, shown in the SQLNET.ORA parameters above should be transferred to the Kerberos Thin Config settings in SQL Developer.

Then, SQL Developer thin connections can be configured to use Kerberos:

  • Authentication type is Kerberos,
    • the username and password are blank,
    • the password saved checkbox is blank
  • Connection type is Basic
    • The hostname, port and service are the same as found in tnsnames.ora

Error Messages

The message "Status : Failure -Test failed: IO Error: The service in process is not supported. Unable to obtain Principal Name for authentication  (CONNECTION_ID=…" indicates that the Kerberos ticket has expired and needs to be renewed or recreated.

Tuesday, March 18, 2025

Locally Partitioned Unique Indexes on Reference Partitioned Tables

Normally, if you want to locally partition a unique index, you must include the partitioning key in the index key. Otherwise, you get will error ORA-14039: partitioning columns must form a subset of key columns of a UNIQUE index
CREATE TABLE PS_JRNL_HEADER 
(BUSINESS_UNIT VARCHAR2(5 CHAR) NOT NULL
,JOURNAL_ID VARCHAR2(10 CHAR) NOT NULL
,JOURNAL_DATE DATE NOT NULL
,UNPOST_SEQ NUMBER NOT NULL
…
) 
PARTITION BY RANGE (fiscal_year) INTERVAL (1)
(PARTITION FISCAL_YEAR_2016 VALUES LESS THAN (2017))
…
/

CREATE UNIQUE INDEX PS_JRNL_HEADER 
ON PS_JRNL_HEADER (BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE, UNPOST_SEQ)
LOCAL
/

ORA-14039: partitioning columns must form a subset of key columns of a UNIQUE index
This rule also applies to indexes on reference partitioned tables but in a slightly different way. The unique key on the child table cannot contain the partitioning key because those columns are only on the parent table. However, it can still be locally partitioned if it includes the parent table's primary key. 
Here is an example taken from PeopleSoft General Ledger. We can't add foreign keys to the PeopleSoft database, but we can add them to an archive database to support queries of archived data.
CREATE TABLE PS_JRNL_HEADER 
(BUSINESS_UNIT VARCHAR2(5 CHAR) NOT NULL
,JOURNAL_ID VARCHAR2(10 CHAR) NOT NULL
,JOURNAL_DATE DATE NOT NULL
,UNPOST_SEQ NUMBER NOT NULL
…
,CONSTRAINT PS_JRNL_HEADER PRIMARY KEY (BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE, UNPOST_SEQ)
) 
PARTITION BY RANGE (fiscal_year) INTERVAL (1)
SUBPARTITION BY RANGE (accounting_period) 
SUBPARTITION TEMPLATE 
(SUBPARTITION accounting_period_00 VALUES LESS THAN (1)
…
,SUBPARTITION accounting_period_12 VALUES LESS THAN (13)
,SUBPARTITION accounting_period_max VALUES LESS THAN (maxvalue)
)
(PARTITION FISCAL_YEAR_2016 VALUES LESS THAN (2017))
COMPRESS FOR QUERY LOW 
/

CREATE TABLE PS_JRNL_LN 
(BUSINESS_UNIT VARCHAR2(5 CHAR) NOT NULL
,JOURNAL_ID VARCHAR2(10 CHAR) NOT NULL
,JOURNAL_DATE DATE NOT NULL
,UNPOST_SEQ NUMBER NOT NULL 
,JOURNAL_LINE NUMBER(9,0) NOT NULL
,LEDGER VARCHAR2(10 CHAR) NOT NULL
…
,CONSTRAINT PS_JRNL_LN PRIMARY KEY (BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE, UNPOST_SEQ, JOURNAL_LINE, LEDGER)
,CONSTRAINT PS_JRNL_LN_FK FOREIGN KEY (BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE, UNPOST_SEQ) REFERENCES PS_JRNL_HEADER 
)
PARTITION BY REFERENCE(PS_JRNL_LN_FK)
COMPRESS FOR ARCHIVE LOW 
/
If I try to locally partition a unique index without one of the parent table's unique key columns, I get ORA-14039, which is exactly as I might expect.
CREATE UNIQUE INDEX PS_JRNL_LN2 
ON PS_JRNL_LN (BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE, /*UNPOST_SEQ,*/ JOURNAL_LINE, LEDGER) 
LOCAL COMPRESS 3
/

ORA-14039: partitioning columns must form a subset of key columns of a UNIQUE index
14039. 00000 -  "partitioning columns must form a subset of key columns of a UNIQUE index"
*Cause:    User attempted to create a UNIQUE partitioned index whose
           partitioning columns do not form a subset of its key columns
           which is illegal
*Action:   If the user, indeed, desired to create an index whose
           partitioning columns do not form a subset of its key columns,
           it must be created as non-UNIQUE; otherwise, correct the
           list of key and/or partitioning columns to ensure that the index'
           partitioning columns form a subset of its key columns

What is going on here? 

  • JRNL_HEADER is partitioned on FISCAL_YEAR and sub-partitioned on ACCOUNTING_PERIOD. 
  • JRNL_LN is reference-partitioned. Reference partitioning requires an enforced foreign key constraint. JRNL_LN has a foreign key constraint that references JRNL_HEADER. Thus, there is a 1:1 relationship of partitions on JRNL_LN (the child table), to partitions or in this case, sub-partitions on JRNL_HEADER (the parent table).
SELECT table_name, composite, partition_name, subpartition_count, partition_position, high_value
FROM   user_tab_partitions
WHERE  table_name LIKE 'PS_JRNL%' ORDER BY 1 DESC
/

                                                                 SubP Part
TABLE_NAME         COM PARTITION_NAME                           Count  Pos HIGH_VALUE
------------------ --- ---------------------------------------- ----- ---- --------------------
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_00        0    1
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_01        0    2
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_02        0    3
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_03        0    4
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_04        0    5
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_05        0    6
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_06        0    7
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_07        0    8
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_08        0    9
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_09        0   10
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_10        0   11
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_11        0   12
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_12        0   13
PS_JRNL_LN         NO  FISCAL_YEAR_2016_ACCOUNTING_PERIOD_MAX       0   14
PS_JRNL_HEADER     YES FISCAL_YEAR_2016                            14    1 2017

15 rows selected.
The partitioning keys recorded for the reference partitioned table JRNL_LN are the primary key columns on the parent table, although it is partitioned and sub-partitioned by different columns (FISCAL_YEAR and ACCOUNTING_PERIOD).
SELECT * FROM user_part_key_columns WHERE name LIKE 'PS_JRNL%' ORDER BY 1,2 desc, 4
/

NAME                 OBJEC COLUMN_NAME          COLUMN_POSITION COLLATED_COLUMN_ID
-------------------- ----- -------------------- --------------- ------------------
PS_JRNL_HEADER       TABLE FISCAL_YEAR                        1                   
PS_JRNL_LN           TABLE BUSINESS_UNIT                      1                   
PS_JRNL_LN           TABLE JOURNAL_ID                         2                   
PS_JRNL_LN           TABLE JOURNAL_DATE                       3                   
PS_JRNL_LN           TABLE UNPOST_SEQ                         4
The parent table in a foreign key relationship must have a primary key, and the child table must reference it. The primary key of the parent table is a proxy for the partitioning key. The two tables effectively share the partitioning key through the 1:1 relationship of partitions. Each primary key on the parent table can only appear in a single sub-partition, and therefore, corresponding child rows can only appear in the corresponding partition in the child table. Therefore, the uniqueness of the locally partitioned index on the child table can be assured by inspecting just the local partition. 

Natural -v- Surrogate Keys

The example above uses natural keys. The key on the child table leads with the key columns of the parent table, followed by one or more additional key columns.  Thus it is possible to locally partition the primary or unique key index on the child table.  
However, if a data model uses surrogate keys then the key on the child table is a completely different column to the key on the parent table, and it would not be possible to locally partition an index unless it also contained the parent surrogate key, in which case it could not be used to enforce uniqueness.

TL;DR

It appears that the rule that partitioning columns must form a subset of key columns of a unique index has been relaxed. It hasn't really. Instead, the reference partition uses the primary key on the parent of the foreign key as a proxy for the partitioning key.

Tuesday, March 04, 2025

SQL Quarantine Behaviour When the Same SQL Executes in Different Resource Manager Consumer Groups With Different CPU Time Limits

What happens if I have two consumer groups with different CPU time limits, but the same long-running SQLs can be run in either group?

There is a use case for this question.  In my sample resource manager plan for PeopleSoft, there are consumer groups with different maximum CPU time limits.  4 hours for scheduled queries, and 2 hours for SQL*Plus/SQL Developer sessions.  

The tests in this article are based on the examples in Tim Hall's article about SQL Quarantine in 19c.

N.B. SQL Quarantine is an Exadata-only feature.

I have created a simple plan with two consumer groups that have CPU time-outs of 30 and 60 seconds.  There are no group mappings; I will switch groups manually.

BEGIN 
  DBMS_RESOURCE_MANAGER.CLEAR_PENDING_AREA;
  DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA;

  DBMS_RESOURCE_MANAGER.CREATE_PLAN('GFC_TEST_PLAN');
  DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP('LOW_LIMITED30_GROUP');
  DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP('LOW_LIMITED60_GROUP');

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'GFC_TEST_PLAN', 'LOW_LIMITED30_GROUP',
    mgmt_p8 => 1,
    switch_group => 'CANCEL_SQL', 
    switch_time => 30, 
    switch_for_call => TRUE);

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'GFC_TEST_PLAN', 'LOW_LIMITED60_GROUP', 
    mgmt_p8 => 1,
    switch_group => 'CANCEL_SQL', 
    switch_time => 60, 
    switch_for_call => TRUE);

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'GFC_TEST_PLAN', 'OTHER_GROUPS');

  DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA;
  DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA;

  DBMS_RESOURCE_MANAGER_PRIVS.grant_switch_consumer_group(
    grantee_name   => user,
    consumer_group => 'LOW_LIMITED30_GROUP',
    grant_option   => FALSE);

  DBMS_RESOURCE_MANAGER_PRIVS.grant_switch_consumer_group(
    grantee_name   => user,
    consumer_group => 'LOW_LIMITED60_GROUP',
    grant_option   => FALSE);
END;
/

ALTER SYSTEM SET resource_manager_plan = 'GFC_TEST_PLAN';
I am going to create a PL/SQL function that will burn CPU as it goes around a loop.  It is just like Tim's, except that I specify the number of seconds for which I want mine to run, and it will return the number of times it has looped.
set serveroutput on timi on
CREATE OR REPLACE FUNCTION burn_cpu (p_secs IN NUMBER)
  RETURN NUMBER
AS
  l_start_time DATE;
  l_number     NUMBER := 1;
BEGIN
  l_start_time := SYSDATE;
  LOOP
    EXIT WHEN SYSDATE - l_start_time > (p_secs/86400);
    l_number := l_number + 1;
  END LOOP;
  RETURN l_number;
END;
/
I will start by switching to the consumer group LOW_LIMITED30_GROUP
DECLARE
  l_session v$session%ROWTYPE;
  l_consumer_group VARCHAR2(30):= 'LOW_LIMITED30_GROUP';
--l_consumer_group VARCHAR2(30):= 'LOW_LIMITED60_GROUP';
BEGIN 
  SELECT * INTO l_session FROM v$session WHERE sid = Sys_Context('USERENV', 'SID');
  dbms_output.put_line(l_session.sid||'.'||l_session.serial#||':'||l_session.module||':'||l_session.action||':'||l_session.resource_Consumer_Group);
  DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS(l_session.sid,l_session.serial#,consumer_group=>l_consumer_group);
  SELECT * INTO l_session FROM v$session WHERE sid = Sys_Context('USERENV', 'SID');
  dbms_output.put_line(l_session.sid||'.'||l_session.serial#||':'||l_session.module||':'||l_session.action||':'||l_session.resource_Consumer_Group);
END;
/
I have patch 30104721 installed on 19c to backport the new parameters in 23c, so I need to enable quarantine capture (see Oracle Doc ID 2635030.1: 19c New Feature SQL Quarantine - How To Stop Automatic SQL Quarantine).
ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_QUARANTINE = TRUE;
ALTER SESSION SET OPTIMIZER_USE_SQL_QUARANTINE = TRUE;
I can run queries that usually run for 45 and 75 seconds, but they will be stopped when they have consumed 30 seconds of CPU time.
SELECT burn_cpu (45) FROM dual
Error report -
ORA-12777: A non-continuable error encountered.  Check the error stack for additional information [40], [pfrruncae_check_async_events], [], [].
ORA-00040: active time limit exceeded - call aborted

Elapsed: 00:00:33.574
SELECT burn_cpu (75) FROM dual
Error report -
ORA-12777: A non-continuable error encountered.  Check the error stack for additional information [40], [pfrruncae_check_async_events], [], [].
ORA-00040: active time limit exceeded - call aborted

Elapsed: 00:00:34.023
After a short period, I have 2 quarantine directives

select signature, name, sql_text, plan_hash_value, cpu_time, origin
 from dba_sql_quarantine
where sql_text like '%burn_cpu%'


             SIGNATURE NAME                                     SQL_TEXT                       PLAN_HASH_VALUE CPU_TIME   ORIGIN           
---------------------- ---------------------------------------- ------------------------------ --------------- ---------- ----------------
  12281544607895693562 SQL_QUARANTINE_anw6gs0gbta7u125daea2     SELECT burn_cpu (75) FROM dual       308129442 30         RESOURCE-MANAGER
   3516266497211383477 SQL_QUARANTINE_31m29rz6y8gpp125daea2     SELECT burn_cpu (45) FROM dual       308129442 30         RESOURCE-MANAGER
Now, the statements are prevented from running immediately.

SELECT burn_cpu (45) FROM dual
Error report -
ORA-56955: quarantined plan used

Elapsed: 00:00:00.414
SELECT burn_cpu (75) FROM dual
Error report -
ORA-56955: quarantined plan used

Elapsed: 00:00:00.416

Next, I will switch the consumer group to LOW_LIMITED60_GROUP.  The CPU limit is now 60 seconds.  Now, neither command returns ORA-56955.

The 45-second process runs to a successful conclusion.

SELECT burn_cpu (45) FROM dual;

BURN_CPU(45)
------------
    63264226

Elapsed: 00:00:45.846

The 75-second process runs for 65 seconds and is then terminated with ORA-00040 because it has reached the time limit, but not because the execution plan was quarantined.

SELECT burn_cpu (75) FROM dual
Error report -
ORA-12777: A non-continuable error encountered.  Check the error stack for additional information [40], [pfrruncae_check_async_events], [], [].
ORA-00040: active time limit exceeded - call aborted

Elapsed: 00:01:05.952

A quarantine directive with a lower CPU_TIME limit than that of the current consumer group is not applied because the statement may run to completion in a time between the lower limit and the higher limit.  Oracle allows the query to execute; it will be aborted when it reaches the higher CPU time.  

After a while, as the documentation indicates, quarantine capture will update the CPU time limit on the existing quarantine definition limit to the higher limit in the current consumer group.

The query in our example runs for 20 minutes only once, and then never again—unless the resource limit increases or the plan changes. If the limit is increased to 25 minutes, then the Resource Manager permits the statement to run again with the quarantined plan. If the statement runs for 23 minutes, which is below the new threshold, then the Resource Manager removes the plan from quarantine. If the statement runs for 26 minutes, which is above the new threshold, the plan remains in quarantine unless the limit is increased.

However, for the statement that runs successfully (for 45 seconds) in the consumer group with the higher limit (60 seconds), I have not seen the database remove the quarantine directive, even if I remove LOW_LIMITED30_GROUP consumer group from all resource plans and drop it from the database.

SIGNATURE           NAME                                     SQL_TEXT                       PLAN_HASH_VALUE CPU_TIME   ORIGIN          
------------------- ---------------------------------------- ------------------------------ --------------- ---------- ----------------
  12281544607895693562 SQL_QUARANTINE_anw6gs0gbta7u125daea2  SELECT burn_cpu (75) FROM dual       308129442 60         RESOURCE-MANAGER
   3516266497211383477 SQL_QUARANTINE_31m29rz6y8gpp125daea2  SELECT burn_cpu (45) FROM dual       308129442 30         RESOURCE-MANAGER

The next time an attempt is made to execute the 75-second statement in either consumer group, it is quarantined as before.

SELECT burn_cpu (75) FROM dual
Error report -
ORA-56955: quarantined plan used

Conclusion

  • SQL Quarantines are not tied to any particular resource plan or consumer group.
  • If the current consumer group doesn't have any timeout set, then any matching SQL quarantine directive is not applied
  • If a statement has a quarantine directive with a higher CPU time limit than the current consumer group, then it is applied immediately.
  • If a statement has a quarantine directive with a lower CPU time limit than the current consumer group, then it will be allowed to execute.  If the runtime then exceeds the consumer group CPU limit, then it will be timed out with ORA-00040.  The CPU time limit on the quarantine directive will be increased to the limit in the current consumer group.  In future, the statement will immediately be prevented from executing with ORA-56955 in both consumer groups.

Therefore, it is safe to allow different consumer groups in which the same long-running SQL statements may be executed to have different timeouts.  The lower quarantine timeout will not apply to executions in consumer groups with higher timeouts.  The SQL quarantine directives will evolve to have higher timeouts as required.

Monday, March 03, 2025

New Parameters In 21c To Control Automatic SQL Quarantine Can Be Backported To 19c

SQL Quarantine is only available on Exadata.  In 19c, automatic quarantine generation and subsequent use are enabled by default.  

In Oracle 21c, two new parameters have been introduced to control SQL Quarantine.

  • OPTIMIZER_CAPTURE_SQL_QUARANTINE enables or disables the automatic creation of SQL Quarantine configurations.  The default value is FALSE.  If enabled, when the Resource Manager terminates a SQL statement because the statement has exceeded resource limits, the database automatically creates a SQL Quarantine configuration for the execution plan used by the terminated SQL statement. 
  • OPTIMIZER_USE_SQL_QUARANTINE determines whether the optimizer considers SQL Quarantine configurations when choosing an execution plan for a SQL statement.  The default value is TRUE.

Thus, these parameters allow a system to use the SQL quarantine functionality but disable the automatic creation of quarantine configuration or restrict it to particular sessions.

This is a change in default behaviour, or at least a change to the original behaviour in 19c.  Quarantine directives will not be generated by default, but if they exist, the optimiser will apply them.

The parameters can be backported to Oracle 19.3 or later by applying patch 30104721 (see Oracle Doc ID 2635030.1: 19c New Feature SQL Quarantine - How To Stop Automatic SQL Quarantine.

These parameters can both be set at system and session level.

If SQL quarantine configurations are not created, or set not to be used, or if you are not on Exadata, then SQLs will run to the CPU limit in the current consumer group before they are cancelled.