UKOUG Tech 2018 Conference, Liverpool

Sunday, November 11, 2018

How Not to Build A Data Warehouse

My day job involves investigating and resolving performance problems, so I get to see a lot of bad stuff.  Often, these problems have their roots in poor design.  It is not surprising. but is nonetheless disappointing, that when I point this out I am told that the system is either delivered this way by the vendor, or it has already been built and it is too late to change.
In the last couple of years, I have worked on several data warehouse applications that have provided the inspiration for a new presentation that I am giving at the DOAG and UKOUG conferences this year.
The presentation and this series of related blogs have several objectives:
  • Partly, it is an attempt to get some advice out in the hope that some of these poor design decisions are not made in the first place. 
  • I get to have a bit of a rant about how to design a data warehouse properly, and I will feel better at the end it.
  • I get to attend two really good conferences where I will learn (or be reminded of) much.
This particular blog post is an index to various topics within the presentation, that I have turned into separate blog postings.

Oracle Sales History (SH) Sample Schema

I cannot illustrate issues with actual examples from client systems, no matter how well anonymised they are. So, I have recreated some of the most egregious mistakes using the Oracle Sales History example schema, SH.  Everything I am going to show you does have an origin in the real world!  The SH sample schema is a simple and very typical star schema.  It is well designed and implemented, and represents good practice. I encourage you to install and play with it for yourself.
My examples use a single fact table (SALES), that has five dimensions around it (CUSTOMERS, CHANNELS, PRODUCTS, PROMOTIONS and TIMES), although I will only use three. COUNTRIES is a dimension on the CUSTOMERS dimension, sometimes called a 'snowflake'.
In some of my examples, I have deliberately broken the data model in exactly the same way that I have seen it broken in real life. So, if you find yourself saying "nobody would actually do that!", let me assure you that they did!

Common Mistakes in Data Warehouse Design and Build

As each blog post is published, I will add a link to them in this section.
  1. Lack of Foreign Keys
  2. Effective Dated Dimensions
  3. Dates that are not Dates

Other Topics

  • To Index or Not: Star Transformation -v- Full Scan-Bloom Filter
  • Snowflakes and Lost Skew
  • Engineered System Considerations
  • Will Anything Run on Autonomous Data Warehouse Cloud?

Tuesday, August 21, 2018

Parallel Execution of PL/SQL

A recent experience with text searching showed up some limitations on parallel execution of PL/SQL functions, and how to work around them.

Text Searching can be CPU Intensive?

It started with a requirement to find names that roughly matched a search string. The roughness of the match was determined by calculating the Levenshtein distance between each customer name and the search string. Informally, the Levenshtein distance is a mathematical function that calculates the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Oracle has implemented it as a function in a delivered package UTL_MATCH.EDIT_DISTANCE().  This function isn't directly relevant to the rest of the discussion about parallelism, except that it is an example of CPU intensive processing that doesn't alter the database where parallel execution may be a legitimate tactic for improving performance.
The examples in this article use the SH schema in the Oracle sample schemas (available on Github).

The following query searches case-insensitively for customer names within a Levenshtein distance of 3 from 'Stephen'. It finds 'Steven' and 'Staphany'.
set autotrace on timi on pages 99 lines 200
with x as (
select c.cust_first_name
,      utl_match.edit_distance(upper(cust_first_name),'STEPHEN') edit_distance
from customers c
)
select * from x
where edit_distance <= 3
/

CUST_FIRST_NAME      EDIT_DISTANCE
-------------------- -------------
Staphany                         3
Steven                           2
Steven                           2
However, to do so Oracle had to full scan the table and execute the function for every row in the table.
Plan hash value: 2008213504
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  2775 | 19425 |   429   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |  2775 | 19425 |   429   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("UTL_MATCH"."EDIT_DISTANCE"(UPPER("CUST_FIRST_NAME"),'STEPHEN')<=3)
Oracle implemented Levenshtein as a C function that is then called from PL/SQL, so it just consumes CPU and doesn't do anything else to the database. You can start to see why the user complained that the query with UTL_MATCH was slow.
However, the first question is how much time do we spend on the full scan and how much time do we spend executing this function?

Follow the Time with Instrumentation 

For test purposes, I am going to build my own packaged functions with session instrumentation. Then I can use Active Session History (ASH) to work out where the time went.
NB: ASH requires a licence for the Diagnostics Pack
  • One function levenshtein() calls to UTL_MATCH.EDIT_DISTANCE().
  • The other dolittle() is a control function that does nothing except instrumentation. It is used to measure the intrusion effect of the instrumentation.
CREATE OR REPLACE package dmk AS
function levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
function dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
END;
/

CREATE OR REPLACE package body dmk AS
function levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER IS
  l_distance INTEGER;
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
BEGIN
  dbms_application_info.read_module(l_module,l_action); /*save current module/action*/
  dbms_application_info.set_action('levenshtein()'); /*set action for function*/
  l_distance := utl_match.edit_distance(UPPER(p1),UPPER(p2));
  dbms_application_info.set_action(l_action); /*restore previous action*/
  RETURN l_distance;
END levenshtein;

function dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER IS
  l_distance INTEGER;
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
BEGIN
  dbms_application_info.read_module(l_module,l_action);
  dbms_application_info.set_action('dolittle()');
  l_distance := 1;
  dbms_application_info.set_action(l_action);
  RETURN l_distance;
END dolittle;
END dmk;
/
Now, I will run a test query that executes UTL_MATCH for each of the 55500 rows in the CUSTOMERS table and executes that all 10 times inside a PL/SQL block.
set timi on autotrace on lines 500 serveroutput on
DECLARE
  l_counter INTEGER := 0;
BEGIN
  dbms_application_info.set_module('DMK LEVENSHTEIN TEST',null);
  FOR j IN 1..10 LOOP
    FOR i IN (
      select sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
      ,      sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
      from   customers c
    ) LOOP
      l_counter := l_counter+i.b;
    END LOOP;
  END LOOP;
  dbms_application_info.set_module(null,null);
  dbms_output.put_line('Executions: '||l_counter);
END;
/
Executions: 555000

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.08
We can see that the Levenshtein function was executed 555000 times.  Now I will query the ASH for this test and group it by ACTION.
set timi on autotrace off lines 500
column module format a25
column action format a25
select module, action, sql_id, sum(1) ash_Secs
from v$active_Session_History
where module = 'DMK LEVENSHTEIN TEST'
and sample_time >= SYSDATE-1/1440
group by module, action, sql_id
/

MODULE                    ACTION                    SQL_ID          ASH_SECS
------------------------- ------------------------- ------------- ----------
DMK LEVENSHTEIN TEST      dolittle()                7gp0w6qdvxrd2          2
DMK LEVENSHTEIN TEST      levenshtein()             7gp0w6qdvxrd2          3
DMK LEVENSHTEIN TEST                                7gp0w6qdvxrd2         20
The runtime of the Levenshtein function took 3 seconds, and the function that does nothing except instrumentation is 2 seconds, so the overhead of UTL_MATCH is only about 1 second, and there are 20 seconds in the SQL. In this test, the overhead of Levenshtein is low, but it would still be worth doing the full scan in parallel.

I Can't Get No Parallelism 

But there is a problem! I specified parallelism with a hint, but I don't get a parallel plan.
select /*+PARALLEL(C 4)*/ avg(utl_match.edit_distance(UPPER(cust_first_name),'STEPHEN'))
from customers c
/

Plan hash value: 1978308596
---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |     1 |     7 |   115   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE     |           |     1 |     7 |            |          |
|   2 |   SORT AGGREGATE    |           |     1 |     7 |            |          |
|   3 |    TABLE ACCESS FULL| CUSTOMERS | 55500 |   379K|   115   (0)| 00:00:01 |
---------------------------------------------------------------------------------
However, if I use a simple SQL function, then I do get parallelism.
select /*+PARALLEL(C 4)*/ max(cust_first_name)
from customers c
/

Plan hash value: 1221513835
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |     1 |     7 |   115   (0)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE        |           |     1 |     7 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |           |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000  |     1 |     7 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |           |     1 |     7 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |           | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| CUSTOMERS | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 4 because of table property
So, UTL_MATCH.EDIT_FUNCTION must be preventing parallelism in some way. There is a closed Oracle bug 169587070 that details exactly this problem with UTL_MATCH.

PARALLEL_ENABLE and PRAGMA RESTRICT_REFERENCES 

This is also documented default behaviour in PL/SQL. Database VLDB and Partitioning Guide: About Parallel Execution of Functions, Functions in Parallel Queries:
"A user-written function may be executed in parallel in any of the following cases: 
  • If it has been declared with the PARALLEL_ENABLE keyword
  • If it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES clause that indicates all of WNDS, RNPS, and WNPS 
  • If it is declared with CREATE FUNCTION and the system can analyze the body of the PL/SQL code and determine that the code neither writes to the database nor reads or modifies package variables" 
Oracle only wraps the bodies of their delivered package, but not the package specifications. So, you can see for yourself that the delivered UTL_MATCH package does not contain these specifications.

Workarounds 

There are various workarounds for this.
  • I could add the PARALLEL_ENABLE declarations to the functions in the package specification of the delivered UTL_MATCH package. Although it does work, I would certainly not be happy to alter any delivered Oracle package in any serious database, without approval from Oracle support.
  • Or, I could add the RESTRICT_REFERENCES pragma to the package specification. Again, although this works, it involves altering a delivered package.
  • However, I can wrap the delivered package in my own packaged function with either PARALLEL_ENABLE (my personal preferrance).
CREATE OR REPLACE package dmk AS
FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE;
FUNCTION dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE;
END;
/
  • Or you can use RESTRICT_REFERENCES in the package specification, but you must include the TRUST pragma to over-ride the lack of a pragma definition in the called packaged function.
CREATE OR REPLACE package dmk AS
FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
PRAGMA RESTRICT_REFERENCES (levenshtein,WNDS,RNDS,WNPS,RNPS,TRUST);
FUNCTION dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
PRAGMA RESTRICT_REFERENCES (do_little,WNDS,RNDS,WNPS,RNPS,TRUST);
END;
/
Now, I get parallel execution of the unaltered delivered UTL_MATCH.EDIT_DISTANCE() function.
select /*+PARALLEL(C 4) FULL(C)*/ sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
,      sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
from   customers c
/

Plan hash value: 1221513835
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |     1 |     7 |   115   (0)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE        |           |     1 |     7 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |           |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000  |     1 |     7 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |           |     1 |     7 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |           | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| CUSTOMERS | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 4 because of table property

Conclusion: Parallel PL/SQL 

I can now repeat the earlier test, but with a parallel hint, the runtime goes down from 25 to 9 seconds, although about the same amount of database time is recorded by ASH. So parallelism can improve the performance for the end user, but will not reduce the total CPU overhead. If anything is likely to increase overall CPU time.
set timi on lines 500
DECLARE
  l_counter INTEGER := 0;
BEGIN
  dbms_application_info.set_module('DMK LEVENSHTEIN TEST',null);
  FOR j IN 1..10 LOOP
    FOR i IN (
      select /*+PARALLEL(C 4)*/
             sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
      ,      sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
      from   customers c
    ) LOOP
      l_counter := l_counter+i.b;
    END LOOP;
  END LOOP;
  dbms_application_info.set_module(null,null);
  dbms_output.put_line('Executions: '||l_counter);
END;
/
Executions: 555000

PL/SQL procedure successfully completed.

Elapsed: 00:00:09.34

set timi on autotrace off lines 500
column module format a32
column action format a32
select module, action, sql_id, sum(1) ash_Secs
from v$active_Session_History
where module = 'DMK LEVENSHTEIN TEST'
and sample_time >= SYSDATE-1/1440
group by module, action, sql_id
/

MODULE                           ACTION                           SQL_ID          ASH_SECS
-------------------------------- -------------------------------- ------------- ----------
DMK LEVENSHTEIN TEST                                              3391gqpzu5g2k         21
DMK LEVENSHTEIN TEST             levenshtein()                    3391gqpzu5g2k          5
DMK LEVENSHTEIN TEST             dolittle()                       3391gqpzu5g2k          4
NB: I have not been able to get parallelism to work in a PL/SQL function defined in a WITH clause because you cannot specify PARALLEL_ENABLE, and a pragma can only be specified in a package specification.
WITH
  FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE IS
  BEGIN
    RETURN utl_match.edit_distance(UPPER(p1),UPPER(p2));
  END;
select /*+PARALLEL(C 4)*/
       sum(levenshtein(cust_first_name,'STEPHEN')) a
from   customers c
/
ORA-06553: PLS-712: illegal option for subprogram LEVENSHTEIN

Tuesday, April 17, 2018

Choosing the Right Compression

Choosing the right database table compression matters.  It affects the size of the objects and also the performance of your application.  Compression makes objects smaller, and for many systems, this is an objective in itself, but it also affects performance.  As compressed objects are smaller, less physical I/O is required to access them.  However, that saving is paid for in CPU.  More CPU is required to compress the data when it is written, and then again (albeit usually less CPU) to decompress it when you read it.  The question is whether the saving in I/O pays for the additional CPU.

There are a number of forms of compression to choose between.  They have different characteristics.  The resulting objects are different sizes, and they require different amounts of CPU to compress and decompress.
  • Basic Compression:  This is not really compression, but a form of block-level de-duplication.  It has been available since Oracle 9i.  It is available on all platforms.
  • Hybrid Columnar Compression is only available on Engineered Systems, FS Flash Storage System and ZFS Storage Appliance (ZFSSA).  4 different levels of compression that use 3 different compression algorithms.  HCC is frequently positioned as being highly effective at reducing the size of data warehouses, but it also has implications for performance.
    • Query Low: LZO algorithm
    • Query High: ZLIB algorithm
    • Archive Low: ZLIB algorithm higher level with a larger block size.
    • Archive High: BZIP2 algorithm.  The higher compression ratio comes at the expense of significantly increased CPU overhead on decompression.
Further Reading:
Which one is right for your application?  It depends.  Ultimately, you need to test it for yourself.  In this blog, I am going to relate my experience of one particular system, including how I tested it.  Our objective was optimising performance, rather than reducing database size.

My Test System

  • SuperCluster M7 (i.e. an Engineered System), Oracle 12.1.0.2, 2-node RAC, running PeopleSoft Financials 9.2.
  • In particular, this system runs a large number of complex PeopleSoft General Ledger nVision reports that use a number of reporting views on summary ledgers. There are materialized views built on the reporting views.  The summary ledgers are updated nightly.  Then the materialized views are fully refreshed in non-atomic mode so the tables are truncated and repopulated in direct-path mode.  Hence, they can be Hybrid Columnar Compressed without needing the Advanced Compression licence.  The materialized views are not indexed.  Thus they can only be full scanned, and these scans can be offloaded to the storage cells (i.e. smart scan).
The metrics presented here are extracted from test system where the summary ledger tables are updated, materialised view refreshed and then 136 General Ledger (GL) report books are run on regular a daily schedule.  This approach ensures that each test is, as far as was possible, run with the same initial conditions and only known parameters are changed each time.  In each test, a different compression level was set on the materialized views.
  • I have not investigated ARCHIVE compression.

Writing

Description Materialized Views
Size (MB) Comp. % Comp. Ratio Elapsed Total DB Time
No Compression 151,868 0% 1.0 1:01 18:13
Basic Compression 34,720 77% 4.4 1:12 22:54
Query Low 22,666 85% 6.7 1:00 18:27
Query High 14,086 91% 10.8 1:05 22:56
all times expressed in hours:minutes
  • While basic compression does significantly reduce table sizes (4.4x), Hybrid Columnar Compression does better (6.7x – 10.8x).  
  • HCC Query Low only has a very small overhead while writing data.  The materialized views are refreshed in parallel, and while the total DB time is higher, the elapsed time is lower!
  • Both basic compression and HCC Query High take about 20% longer to write the data in direct path mode.

Reading

Description nVision DB Time
On CPU+CPU Wait cell smart table scan Others Total On CPU+CPU Wait cell smart table scan Others
No Compression 201:26 97:12 18:15 316:54 64% 31% 6%
Basic Compression 466:13 66:03 31:07 563:24 83% 12% 6%
Query Low 241:13 12:34 44:57 298:45 81% 4% 15%
Query High 278:20 4:43 11:38 294:42 94% 2% 4%
The nVision GL reports only read the materialized views (although they do write some temporary working storage tables).
  • As the level of compression of the materialized views increases, the time spent on cell smart table scan (physical read) decreases, this is mainly because the objects are smaller.
  • The slightly surprising result is that the CPU overhead of basic compression is higher than either form of HCC, and more than twice that of using uncompressed tables.  Hence it significantly increases the total database response time
  • Both forms of query HCC perform better than the uncompressed table.  The increase in CPU time is more than compensated by the reduction in time spent on physical I/O.
  • The Query low test was affected by other factors external to the test (a lot more time spent in other wait events).  Otherwise, it would have outperformed query high.  If we look at just CPU and physical read time it is clearly the best option.
In an offloaded (i.e. smart) scan on an engineered system, decompression as well as the physical I/O is done in the storage cell.  The saving in time spent on physical between query low and query high is small because the disk subsystem performs so well on the engineered system.
Whereas on a non-engineered system compression and I/O must be done by the database server, there is nowhere else.  You may well find that a different level of compression is optimal.

Conclusion

  • Basic compression exhibits a surprisingly high CPU overhead.  In my test, the performance was poorer than with uncompressed tables.
  • HCC Query low was generally the best option, saving in I/O more than paid for the increase in CPU.  
  • I also analysed ASH data for the different tests broken down by materialized view.  I found it was better to leave smaller materialized views uncompressed but to compress the larger ones.  This is not surprising since HCC adds additional structures.
These results and conclusions are from one set of test on one particular application, on one particular 'engineered' hardware configuration using one particular data set.  So you need to conduct your own comparative test and decide what is best for your system.

Monday, March 05, 2018

Profiling Execution Plans

In my previous blog post, I demonstrated how to identify the active lines in an adaptive execution plan on DBA_HIST_SQL_PLAN so that I could profile behaviour. This post demonstrates a practical application of that technique. This statement comes out of the PeopleSoft Financials Multi-currency Processing. A process extracts groups of rows in elimination sets into a working storage table and joins that table to the ledger table for further processing. The number of rows extracted from the elimination set selector table (PS_ELIM_CF_SEL2001 in this case) can vary in a single process from a few to a few thousand. However, the process does regather optimizer statistics each time.
SQL_ID fwcdxjy41j23n
--------------------
SELECT L.BUSINESS_UNIT,L.ACCOUNT,L.DEPTID,L.PRODUCT,L.BUDGET_REF,L.AFFIL
IATE,L.CHARTFIELD1,L.BOOK_CODE,L.GL_ADJUST_TYPE,L.DATE_CODE,SUM(L.POSTED
_TOTAL_AMT) FROM PS_ELIM_CF_SEL2001 S, PS_LEDGER L WHERE
S.ELIMINATION_SET='UK_OTHER' AND S.TREE_NODE_NUM=1234567890 AND
S.PROCESS_INSTANCE=0001234567 AND L.FISCAL_YEAR=2018 AND
L.LEDGER=S.LEDGER AND L.BUSINESS_UNIT=S.BUSINESS_UNIT AND
L.AFFILIATE=S.AFFILIATE AND L.ACCOUNT=S.ACCOUNT AND L.CURRENCY_CD='GBP'
AND L.STATISTICS_CODE=' ' AND (L.ACCOUNTING_PERIOD<=001 AND
((L.ACCOUNTING_PERIOD>=0 AND S.BALANCE_FWD_SW='Y') OR
L.ACCOUNTING_PERIOD>=1 AND S.BALANCE_FWD_SW='N')) GROUP BY
L.BUSINESS_UNIT,L.ACCOUNT,L.DEPTID,L.PRODUCT,L.BUDGET_REF,L.AFFILIATE,L.
CHARTFIELD1,L.BOOK_CODE,L.GL_ADJUST_TYPE,L.DATE_CODE
The changing literal values mean that every SQL statement has a different SQL_ID. SQL Baselines are not viable here. No one plan produces optimal performance, so a traditional SQL profile proscribing a particular plan is not the answer. Oracle moves between three adaptive plans depending at least partially upon the number of rows in the elimination table.
Plan hash value: 42743070
-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                    |       |       |   237K(100)|       |          |       |
|   1 |  SORT GROUP BY                         |                    |    13 |  1716 |   237K  (1)| 00:00:10 |       |       |
|-  2 |   HASH JOIN                            |                    |    13 |  1716 |   237K  (1)| 00:00:10 |       |       |
|   3 |    NESTED LOOPS                        |                    |    13 |  1716 |   237K  (1)| 00:00:10 |       |       |
|-  4 |     STATISTICS COLLECTOR               |                    |       |       |            |       |          |       |
|   5 |      TABLE ACCESS STORAGE FULL         | PS_ELIM_CF_SEL2001 |    13 |   728 |   138   (2)| 00:00:01 |       |       |
|   6 |     PARTITION RANGE OR                 |                    |     1 |    76 | 18249   (1)| 00:00:01 |KEY(OR)|KEY(OR)|
|   7 |      PARTITION LIST ITERATOR           |                    |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
|   8 |       TABLE ACCESS BY LOCAL INDEX ROWID| PS_LEDGER          |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
|   9 |        INDEX RANGE SCAN                | PSCLEDGER          |     1 |       | 18228   (1)| 00:00:01 |   KEY |   KEY |
|- 10 |    PARTITION RANGE OR                  |                    |     1 |    76 | 18249   (1)| 00:00:01 |KEY(OR)|KEY(OR)|
|- 11 |     PARTITION LIST JOIN-FILTER         |                    |     1 |    76 | 18249   (1)| 00:00:01 |:BF0000|:BF0000|
|- 12 |      TABLE ACCESS STORAGE FULL         | PS_LEDGER          |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
-----------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3968216348
-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                    |       |       |  1679K(100)|       |          |       |
|   1 |  SORT GROUP BY                         |                    |    92 | 12236 |  1679K  (1)| 00:01:06 |       |       |
|   2 |   HASH JOIN                            |                    |    92 | 12236 |  1679K  (1)| 00:01:06 |       |       |
|-  3 |    NESTED LOOPS                        |                    |    92 | 12236 |  1679K  (1)| 00:01:06 |       |       |
|-  4 |     STATISTICS COLLECTOR               |                    |       |       |            |       |          |       |
|   5 |      TABLE ACCESS STORAGE FULL         | PS_ELIM_CF_SEL2001 |    92 |  5244 |   138   (2)| 00:00:01 |       |       |
|-  6 |     PARTITION RANGE OR                 |                    |     1 |    76 | 18249   (1)| 00:00:01 |KEY(OR)|KEY(OR)|
|-  7 |      PARTITION LIST ITERATOR           |                    |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
|-  8 |       TABLE ACCESS BY LOCAL INDEX ROWID| PS_LEDGER          |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
|-  9 |        INDEX RANGE SCAN                | PSCLEDGER          |     1 |       | 18228   (1)| 00:00:01 |   KEY |   KEY |
|  10 |    PARTITION RANGE OR                  |                    |     1 |    76 | 18249   (1)| 00:00:01 |KEY(OR)|KEY(OR)|
|  11 |     PARTITION LIST JOIN-FILTER         |                    |     1 |    76 | 18249   (1)| 00:00:01 |:BF0000|:BF0000|
|  12 |      TABLE ACCESS STORAGE FULL         | PS_LEDGER          |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
-----------------------------------------------------------------------------------------------------------------------------

Plan hash value: 386495123
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                    |       |       |  1763K(100)|       |          |       |
|   1 |  SORT GROUP BY                          |                    |     1 |   132 |  1763K  (2)| 00:01:09 |       |       |
|   2 |   HASH JOIN                             |                    |  2105 |   271K|  1763K  (2)| 00:01:09 |       |       |
|   3 |    PART JOIN FILTER CREATE              | :BF0000            |  2105 |   271K|  1763K  (2)| 00:01:09 |       |       |
|-  4 |     NESTED LOOPS                        |                    |  2105 |   271K|  1763K  (2)| 00:01:09 |       |       |
|-  5 |      STATISTICS COLLECTOR               |                    |       |       |            |       |          |       |
|   6 |       TABLE ACCESS STORAGE FULL         | PS_ELIM_CF_SEL2001 |  2113 |   115K|   138   (2)| 00:00:01 |       |       |
|-  7 |      PARTITION RANGE OR                 |                    |     1 |    76 |  1763K  (2)| 00:01:09 |KEY(OR)|KEY(OR)|
|-  8 |       PARTITION LIST ITERATOR           |                    |     1 |    76 |  1763K  (2)| 00:01:09 |   KEY |   KEY |
|-  9 |        TABLE ACCESS BY LOCAL INDEX ROWID| PS_LEDGER          |     1 |    76 |  1763K  (2)| 00:01:09 |   KEY |   KEY |
|- 10 |         INDEX RANGE SCAN                | PSCLEDGER          |       |       |            |       |      KEY |   KEY |
|  11 |    PARTITION RANGE OR                   |                    |    24M|  1743M|  1763K  (2)| 00:01:09 |KEY(OR)|KEY(OR)|
|  12 |     PARTITION LIST JOIN-FILTER          |                    |    24M|  1743M|  1763K  (2)| 00:01:09 |:BF0000|:BF0000|
|  13 |      TABLE ACCESS STORAGE FULL          | PS_LEDGER          |    24M|  1743M|  1763K  (2)| 00:01:09 |   KEY |   KEY |
------------------------------------------------------------------------------------------------------------------------------
Only a small proportion of statements run by the process are captured during AWR snapshots. With PeopleSoft, because there is so much dynamically generated non-sharable SQL, I generally recommend reducing the snapshot interval to 15 minutes and collecting the top 100 statements (the default is hourly and the top 30 statements), and even so we only have a limited number of samples with the larger statements more likely to be sampled.
We are already using SQL Profiles to apply optimizer parameters to this statement so I can identify entries in DBA_HIST_SQLSTAT by the matched profile, though I could also use the FORCE_MATCHING_SIGNATURE. I will start by extracting the SQL statistics for all matching statements into a temporary working storage table, including the OTHER_XML from the SQL Plan
drop table my_stats purge;
create table my_stats as 
with s as (
 select s.dbid, s.sql_id, s.plan_hash_value, s.sql_profile, s.action
 ,      max(s.optimizer_cost) optimizer_cost
 ,      max(s.PX_SERVERS_EXECS_DELTA) px_servers
 ,      sum(ELAPSED_TIME_DELTA)/1e6 elapsed_time
 from   dba_hist_snapshot x
 ,      dba_hist_sqlstat s
 WHERE  s.SNAP_id = X.SNAP_id
 AND    s.dbid = x.dbid
 AND    s.instance_number = x.instance_number
 AND    s.module = 'GLPOCONS'
 and    s.sql_profile LIKE 'GLPOCONS_AD%B2%'
 and    s.plan_hash_value > 0
 group by s.dbid, s.sql_id, s.plan_hash_value, s.sql_profile, s.action
)
SELECT s.*
,      xmltype(other_xml) other_xml
FROM   s, dba_hist_sql_plan p
WHERE  p.dbid = s.dbid
and    p.sql_id = s.sql_id
and    p.plan_hash_value = s.plan_hash_value
and    p.other_xml IS NOT NULL
and    p.id = 1
/
select count(*) from my_stats 
/
I found it was necessary to break this up into two stages.  Otherwise, I got ORA-32036: unsupported case for inlining of query name in WITH clause. This could be a manifestation of Bug 19684504 - ORA-32036 executing WITH statement with XML (Doc ID 19684504.8)
set pages 99 long 10
break on sql_profile skip 1
column sql_profile     format a17
column op              format 999
column dis             format 999
column par             format 999
column dep             format 999
column skp             format 999
column plan_hash_value format 9999999999 heading 'PHV'
column px_servers      format 9999       heading 'PX|Serv'
column ecost           format 9999       heading 'Elim|Cost'
column lcost           format 99999999   heading 'Ledger|Cost'
column ecard           format 99999      heading 'Elim|Card'
column lcard           format 99999999   heading 'Ledger|Card'
column optimizer_cost  format 99999999   heading 'Opt|Cost'
column elapsed_time    format 9999.99    heading 'Elapsed|Time'
column object_name     format a18 
column elim_obj        format a7         heading 'Bloom?' 
column epoptions       format a14        heading 'Join|Options'
column epoperation     format a16        heading 'Join|Operation'
column eoptions        format a14        heading 'Elim|Options'
column loptions        format a20        heading 'Ledger|Options'
column options         format a20
spool dmk
with y as (
select /*+MATERIALIZE*/ x.*
,      xt.*
,      p.object_type, p.object_name
,      p.operation, p.options, p.cost, p.cardinality
from   my_stats x
,      xmltable('/other_xml/display_map/row' PASSING x.other_xml
       COLUMNS "OP" NUMBER path '@op',
               "DIS" NUMBER path '@dis',
               "PAR" NUMBER path '@par',
               "DEP" NUMBER path '@dep',
               "SKP" NUMBER path '@skp'
       ) xt
,      dba_hist_sql_plan p
where  xt.skp = 0
and    p.dbid = x.dbid
and    p.sql_id = x.sql_id
and    p.plan_hash_Value = x.plan_hash_value
and    p.id = xt.op
)
select x.sql_profile, x.sql_id, x.plan_hash_value, x.px_servers
,      e.object_name, e.options eoptions, e.cardinality ecard, e.cost ecost
,      l.options loptions, l.cardinality lcard, l.cost lcost
,      ep.operation epoperation, ep.object_name elim_obj
,      x.optimizer_cost, x.elapsed_time
From   my_stats x
,      y e
,      y l
,      y ep
WHERE  e.dbid = x.dbid
And    e.sql_id = x.sql_id
And    e.plan_hash_Value = x.plan_hash_value
And    e.object_name like 'PS_ELIM_CF_SEL%'
And    e.object_type = 'TABLE'
AND    l.dbid = x.dbid
And    l.sql_id = x.sql_id
And    l.plan_hash_Value = x.plan_hash_value
And    l.object_name = 'PS_LEDGER'
And    l.object_type = 'TABLE'
And    ep.dbid = x.dbid
And    ep.sql_id = x.sql_id
And    ep.plan_hash_Value = x.plan_hash_value
And    ep.dis = e.par /*parent of display line*/
ORDER by x.sql_profile, e.cardinality, x.optimizer_cost, x.elapsed_time
/
spool off
The SQL query to profile the plan will be different for each set of plans that I want to investigate. The query showing in this blog was developed for the plan I am investigating here. I am interested in three aspects of the execution plan, the access method for each of the two tables, and the method by which they are joined. So, I have to join each SQL stats sub-query (X) back to the execution plan sub-query (Y) three times, selecting a different row from the plan each time.
  • E: table access of PS_ELIM_CF_SEL% table, either by a FULL scan or it might be by ROWID if an index was used.
  • L: table access of PS_LEDGER table. This will either be a FULL scan or by ROWID if an index was used.
  • EP: parent of table access of PS_ELIM_CF_SEL%. This will either be a HASH JOIN or the creation of the Bloom Filter (:BF000). If I order the result by the cardinality of the PS_ELIM_CF_SEL% I can see how and when the plan changes as the number of rows retrieved from it (the cardinality) change.
                                               PX                    Elim             Elim  Elim Ledger                  Ledger    Ledger Join                           Opt Elapsed
SQL_PROFILE       SQL_ID                PHV  Serv OBJECT_NAME        Options          Card  Cost Options                   Card      Cost Operation        Bloom?       Cost    Time
----------------- ------------- ----------- ----- ------------------ -------------- ------ ----- -------------------- --------- --------- ---------------- ------- --------- -------
GLPOCONS_ADPB2001 b7hy0jfppdvn2    42743070     0 PS_ELIM_CF_SEL2001 STORAGE FULL      13   138 BY LOCAL INDEX ROWID         1     18249 NESTED LOOPS                237379      .40
…
                  9yz896xyn5aky    42743070     0 PS_ELIM_CF_SEL2001 STORAGE FULL      51   138 BY LOCAL INDEX ROWID         1     18249 NESTED LOOPS                930851     2.62
                  cbnz1jumm0x23    42743070     0 PS_ELIM_CF_SEL2001 STORAGE FULL      52   138 BY LOCAL INDEX ROWID         1     18249 NESTED LOOPS                949101      .66
…
                  cmajgmuyc3hc3    42743070     0 PS_ELIM_CF_SEL2001 STORAGE FULL      92   138 BY LOCAL INDEX ROWID         1     18249 NESTED LOOPS               1679071    21.89
                  3sbkyfwy6xk02  3968216348     0 PS_ELIM_CF_SEL2001 STORAGE FULL      92   138 STORAGE FULL                 1     18249 HASH JOIN                  1679071    51.95
                  27pxj6qu49zzr  3968216348     0 PS_ELIM_CF_SEL2001 STORAGE FULL      92   138 STORAGE FULL                 1     18249 HASH JOIN                  1679071    52.49
                  8nfbzan3hbzdg  3968216348     0 PS_ELIM_CF_SEL2001 STORAGE FULL      92   138 STORAGE FULL                 1     18249 HASH JOIN                  1679071    67.49
                  827sykfucq01t  3968216348     0 PS_ELIM_CF_SEL2001 STORAGE FULL      93   138 STORAGE FULL                 1     18249 HASH JOIN                  1697321    87.65
                  fmd3r4848nypk  3968216348     0 PS_ELIM_CF_SEL2001 STORAGE FULL      93   138 STORAGE FULL                 1     18249 HASH JOIN                  1697321    89.39
                  34h7j9rqf8y73  2660510816     0 PS_ELIM_CF_SEL2001 STORAGE FULL      97   138 BY LOCAL INDEX ROWID         1   1763685 NESTED LOOPS               1763970     1.54
                  0cfqyuz79qd09   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL      98   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    21.32
                  ff5hf8zpwhmd2   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL      98   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    26.64
                  9j0m5gq5aqssz   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL      99   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    34.28
                  bzbqrc7bw768q    42743070     0 PS_ELIM_CF_SEL2001 STORAGE FULL     102   138 BY LOCAL INDEX ROWID         1     18249 NESTED LOOPS               1861564      .76
                  bjjx0fxmtqzzt    42743070     0 PS_ELIM_CF_SEL2001 STORAGE FULL     102   138 BY LOCAL INDEX ROWID         1     18249 NESTED LOOPS               1861564     3.66
                  aqcxt7par2a22  2660510816     0 PS_ELIM_CF_SEL2001 STORAGE FULL     104   138 BY LOCAL INDEX ROWID         1   1763685 NESTED LOOPS               1763970     9.57
                  4cfsq52xzftpz   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL     104   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    25.39
                  76hq9zbfrm599   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL     104   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    26.11
                  2b8ck8kpyrvhg   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL     104   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    35.71
                  6gtu9tauhp55y   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL     104   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    39.77
…
                  88kyxqx2qbjxg   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL    1021   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    21.73
…
                  dh39mtw8rga5s   386495123     0 PS_ELIM_CF_SEL2001 STORAGE FULL    2113   138 STORAGE FULL          24054654   1763685 PART JOIN FILTER :BF0000   1763970    40.26
I can see that the optimizer goes backwards and forwards between different plans at different times as the cardinality increases. Nevertheless, I can make some informed decisions about how to adjust the optimizer parameters for this statement with hints introduced by a SQL Profile.
  • I can change the point where the optimizer switches from to the full scan and bloom filter to a lower cardinality by increasing the cost of the index with a higher value optimizer_index_cost_adj.
  • I can enable automatic parallelism with parallel_degree_policy, scale the calculated degree of parallelism with parallel_degree_level, but limit the degree of parallelism with parallel_degree_limit. The threshold for parallelism can be set with parallel_min_time_threshold. The parallelism threshold can be set at a slightly lower cardinality than the switch to the full scan.
  • The settings can be tested using EXPLAIN PLAN FOR with the cardinality hint to set the number of rows returned from the PS_ELIM_CF_SEL% table
explain plan for
SELECT /*+cardinality(s 300)
NO_EXPAND 
OPT_PARAM('parallel_degree_policy','AUTO')
OPT_PARAM('parallel_min_time_threshold',220)
OPT_PARAM('parallel_degree_limit',40)
OPT_PARAM('optimizer_index_cost_adj',10912)
*/
L.BUSINESS_UNIT,L.ACCOUNT,L.DEPTID,L.PRODUCT,L.BUDGET_REF
…
/
set echo off trimspool on pages 999 lines 200 autotrace off
select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION -PREDICATE -OUTLINE'));
When satisfactory values have been determined, they can be passed back into the SQL Profile.
…
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
q'[OPT_PARAM('parallel_degree_policy','AUTO')]',
q'[OPT_PARAM('parallel_min_time_threshold',60)]',
q'[OPT_PARAM('parallel_degree_limit',40)]',
q'[OPT_PARAM('optimizer_index_cost_adj',10912)]',
q'[NO_EXPAND]',
…
q'[END_OUTLINE_DATA]');
…
DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => sql_txt,
profile     => h,
name        => 'GLPOCONS_'||i.id,
…
replace     => TRUE,
force_match => TRUE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ 
);
…

Friday, March 02, 2018

How to Identify Active Lines in an Adaptive Execution Plan in DBA_HIST_SQL_PLAN

When investigating performance problems I spend a lot of my time profiling ASH and AWR data. I sometimes want to join back to captured plans in DBA_HIST_SQL_PLAN to see how an object was access, or which index was used, or how many rows the optimizer thought it would get.
From Oracle 12c, we can get adaptive execution plans where the optimizer considers different plans at runtime. Mostly it is a choice between either a nested loop or a hash join of tables.
For example, in the below execution plan, DBMS_XPLAN includes a note to say that inactive plans with a dash.  Thus we can see that the nested loop lookup of PS_LEDGER using the PSCLEDGER index was rejected in favour of a Bloom filter on a full scan of the ledger table.
SQL_ID bk26thygs9c8n

Plan hash value: 38031479
-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                    |       |       | 18275 (100)|       |          |       |
|   1 |  SORT GROUP BY                         |                    |     1 |   135 | 18275   (1)| 00:00:01 |       |       |
|-  2 |   HASH JOIN                            |                    |     1 |   135 | 18274   (1)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS                        |                    |     1 |   135 | 18274   (1)| 00:00:01 |       |       |
|-  4 |     STATISTICS COLLECTOR               |                    |       |       |            |       |          |       |
|   5 |      TABLE ACCESS STORAGE FULL         | PS_ELIM_CF_SEL2002 |     1 |    59 |    25   (0)| 00:00:01 |       |       |
|   6 |     PARTITION RANGE OR                 |                    |     1 |    76 | 18249   (1)| 00:00:01 |KEY(OR)|KEY(OR)|
|   7 |      PARTITION LIST ITERATOR           |                    |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
|   8 |       TABLE ACCESS BY LOCAL INDEX ROWID| PS_LEDGER          |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
|   9 |        INDEX RANGE SCAN                | PSCLEDGER          |     1 |       | 18228   (1)| 00:00:01 |   KEY |   KEY |
|- 10 |    PARTITION RANGE OR                  |                    |     1 |    76 | 18249   (1)| 00:00:01 |KEY(OR)|KEY(OR)|
|- 11 |     PARTITION LIST JOIN-FILTER         |                    |     1 |    76 | 18249   (1)| 00:00:01 |:BF0000|:BF0000|
|- 12 |      TABLE ACCESS STORAGE FULL         | PS_LEDGER          |     1 |    76 | 18249   (1)| 00:00:01 |   KEY |   KEY |
-----------------------------------------------------------------------------------------------------------------------------
…
Note
-----
…
   - this is an adaptive plan (rows marked '-' are inactive)
When I come to look at this plan in DBA_HIST_SQL_PLAN, I will find all 14 lines but there is no column in the view that indicates whether the line is active in the adaptive plan.
select id, operation, options, object_name, cardinality, cost from dba_hist_sql_plan 
where sql_id = 'bk26thygs9c8n' and plan_hash_value = 38031479

        ID OPERATION                      OPTIONS                        OBJECT_NAME        CARDINALITY       COST
---------- ------------------------------ ------------------------------ ------------------ ----------- ----------
         0 SELECT STATEMENT                                                                                1529564
         1 SORT                           GROUP BY                                                    1    1529564
         2 HASH JOIN                                                                                359    1529563
         3 PART JOIN FILTER               CREATE                         :BF0000                    359    1529563
         4 NESTED LOOPS                                                                             359    1529563
         5 STATISTICS COLLECTOR
         6 TABLE ACCESS                   STORAGE FULL                   PS_ELIM_CF_SEL2002         360        172
         7 PARTITION RANGE                OR                                                          1    1529289
         8 PARTITION LIST                 ITERATOR                                                    1    1529289
         9 TABLE ACCESS                   BY LOCAL INDEX ROWID           PS_LEDGER                    1    1529289
        10 INDEX                          RANGE SCAN                     PSCLEDGER
        11 PARTITION RANGE                OR                                                   16845834    1529289
        12 PARTITION LIST                 JOIN-FILTER                                          16845834    1529289
        13 TABLE ACCESS                   STORAGE FULL                   PS_LEDGER             16845834    1529289
The information about whether a line in the execution plan is active and whether it should be displayed by DBMS_XPLAN is encoded in the display map inside an XML structure stored the OTHER_XML column on line 1 of the plan.
<other_xml>
  <info type="nodeid/pflags">000000000001</info>
…
  <info type="adaptive_plan" note="y">yes</info>
  <outline_data>
    <hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint>
    <hint><![CDATA[OPTIMIZER_FEATURES_ENABLE('12.1.0.2')]]></hint>
    <hint><![CDATA[DB_VERSION('12.1.0.2')]]></hint>
    <hint><![CDATA[OPT_PARAM('_unnest_subquery' 'false')]]></hint>
…
  </outline_data>
  <display_map>
    <row op="1" dis="1" par="0" prt="0" dep="1" skp="0"/>
    <row op="2" dis="2" par="1" prt="0" dep="2" skp="0"/>
    <row op="3" dis="3" par="2" prt="0" dep="3" skp="0"/>
    <row op="4" dis="3" par="3" prt="0" dep="3" skp="1"/>
    <row op="5" dis="3" par="3" prt="0" dep="3" skp="1"/>
    <row op="6" dis="4" par="3" prt="0" dep="4" skp="0"/>
    <row op="7" dis="4" par="3" prt="4" dep="3" skp="1"/>
    <row op="8" dis="4" par="3" prt="4" dep="3" skp="1"/>
    <row op="9" dis="4" par="3" prt="3" dep="3" skp="1"/>
    <row op="10" dis="4" par="3" prt="3" dep="3" skp="1"/>
    <row op="11" dis="5" par="2" prt="5" dep="3" skp="0"/>
    <row op="12" dis="6" par="5" prt="6" dep="4" skp="0"/>
    <row op="13" dis="7" par="6" prt="6" dep="5" skp="0"/>
  </display_map>
</other_xml>
OP is the plan line ID. SKP=0 indicates an active line. This data can be extracted as a set of rows and columns with the XMLTABLE function.
My thanks to Tim Hall. I was only able to work this out with the aid of his excellent article XMLTABLE: Convert XML Data into Rows and Columns using SQL and associated video.
(Update 5.3.2018): Since publishing this I have also found Martin Bach's Blog Adaptive plans and v$sql_plan and related views.
column plan_hash_value format 9999999999 heading 'PHV'
column op          format 999
column dis         format 999
column par         format 999
column dep         format 999
column skp         format 999
column object_name format a20
column operation   format a20
column options     format a20
column partition_start format a10 heading 'PStart'

with x as (
select  sql_id, plan_hash_value
,  xmltype(other_xml) xml_data
from dba_hist_sql_plan p
where sql_id IN ('bk26thygs9c8n')
and id = 1
and other_xml IS NOT NULL
), y as (
select sql_id, plan_hash_value
, xt.*
from x
,    xmltable('/other_xml/display_map/row' PASSING x.xml_data
     COLUMNS 
       "OP" NUMBER path '@op',
       "DIS" NUMBER path '@dis',
       "PAR" NUMBER path '@par',
       "DEP" NUMBER path '@dep',
       "SKP" NUMBER path '@skp'
     ) xt
where xt.skp = 0
)
select y.* 
, p.object_name, p.operation, p.options, p.partition_start
from y
, dba_hist_sql_plan p
where p.sql_id = y.sql_id
and p.plan_hash_value = y.plan_hash_value
and y.op = p.id
/
  • OTHER_XML is a CLOB, so it needs to be converted to an XMLTYPE before being passed to XML table.
  • We effectively make a Cartesian product between the row source of SQL plans in sub-query x, and XMLTABLE().
  • The XML structure is attributed rather than tagged, so you need the '@' in the path in the columns clause.
  • Inactive plan lines are filtered out with a criterion on SKP.
Now I have a SQL query that returns only the active rows in the execution plan. I can use this technique in other places to profile statements by aspects of the execution plan.
SQL_ID                PHV   OP  DIS  PAR  DEP  SKP OBJECT_NAME          OPERATION            OPTIONS              PStart
------------- ----------- ---- ---- ---- ---- ---- -------------------- -------------------- -------------------- ----------
bk26thygs9c8n    38031479    1    1    0    1    0                      SORT                 GROUP BY
bk26thygs9c8n    38031479    2    2    1    2    0                      HASH JOIN
bk26thygs9c8n    38031479    3    3    2    3    0 :BF0000              PART JOIN FILTER     CREATE
bk26thygs9c8n    38031479    6    4    3    4    0 PS_ELIM_CF_SEL2002   TABLE ACCESS         STORAGE FULL
bk26thygs9c8n    38031479   11    5    2    3    0                      PARTITION RANGE      OR                   KEY(OR)
bk26thygs9c8n    38031479   12    6    5    4    0                      PARTITION LIST       JOIN-FILTER          :BF0000
bk26thygs9c8n    38031479   13    7    6    5    0 PS_LEDGER            TABLE ACCESS         STORAGE FULL         KEY
I demonstrate a practical use of this technique in my next blog.