DOAG 2019, Nuremburg

Tuesday, October 15, 2019

Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table

I am a regular user of SQLDB360 (the single distribution for Carlos Sierra's eBD360, and Mauro Pagano's SQLd360 tools) for remote performance analysis.  eDB360 reports on the whole database, SQLd360 reports on a single SQL ID.  eDB360 also runs SQLd360 reports for the top SQL statements.  Both tools extract ASH data to a flat-file.  Originally, it was intended that these were loaded with the eAdam utility, but eDB360 no longer executes the eAdam scripts itself.
The eDB360/SQLd360 output zip files contain an export of the ASH data that is spooled from the database in SQL Plus, compressed with gzip, and put into a tarball.  You should find a single .tar file that contains several gzipped files.  In eDB360, dba_hist_active_sess_history.txt.gz is simply the result of:
SELECT * FROM dba_hist_active_sess_history
In SQLd360 it will be for just the specified SQL_ID:
SELECT * FROM dba_hist_active_sess_history
WHERE sql_id = '…'
The exact structure of the file varies with the version of Oracle, as more columns have been added to the view with successive releases.  There used to be an eAdam structure control file from which the DDL for a repository can be generated, but it is no longer generated.  However, it is easy to deal with this manually because many of the reports query DBA_HIST_ACTIVE_SESS_HISTORY and you will find a description of the view in each report.

Loading Compressed File as an External Table

Two database directories must be created in the database used to read the data file.
GRANT CREATE ANY DIRECTORY TO scott;
CREATE OR REPLACE DIRECTORY edb360 as '/media/sf temp';
CREATE OR REPLACE DIRECTORY exec_dir AS '/usr/bin';
GRANT EXECUTE ON DIRECTORY exec_dir TO scott;
  • Directory edb360 points to where the data file is located.  
  • The external table will load the compressed zip file directly, uncompressing it on the fly, without writing an uncompressed version of the file to disk, using the preprocessor facility.  Directory exec_dir is the location of the Unix zcat executable. 
    • N.B. It is important that this is the actual directory where the executable resides, not the linked /bin directory
Now you can create an external table.  I have created it in the SCOTT schema rather than SYS.
There are a few things to bear in mind
  1. DISABLE_DIRECTORY_LINK_CHECK is used to suppress the check that the data file being loaded is not be referenced from a directory link.  In my case, this is necessary because I am referencing a directory on a host machine from within a Linux VM running in VirtualBox.  However, this does not suppress the check for files executed by the preprocessor.  exec_dir must not point to a linked directory.
  2. The compressed data file is pre-processed (effectively piped) through the zcat command.  This is like importing a compressed dump file through a Unix FIFO
  3. The # character has had to be removed from all column names.
  4. The format of date fields in the data file must be specified explicitly to convert them properly.
  5. The name of the compressed file to be imported is specified in the LOCATION clause.
  6. The reject limit has been deliberately set to 100.  A small non-zero value.  The file is exported from DBA_HIST_ACTIVE_SESS_HISTORY.  The actual columns change from version to version, so if the definition of the external table does not match the definition in the database you may get errors.  Limiting the rejections to 100 means that the .log and .bad files are easier to manage when resolving the error.  When the definitions match no rows should be rejected.
DROP TABLE scott.ash_load PURGE;

CREATE TABLE scott.ash_load
(SNAP_ID NUMBER
,DBID NUMBER
,INSTANCE_NUMBER NUMBER
,SAMPLE_ID NUMBER
,SAMPLE_TIME TIMESTAMP(3)
,SESSION_ID NUMBER
,SESSION_SERIAL NUMBER
,SESSION_TYPE VARCHAR2(10)
,FLAGS NUMBER
,USER_ID NUMBER
,SQL_ID VARCHAR2(13)
,IS_SQLID_CURRENT VARCHAR2(1)
,SQL_CHILD_NUMBER NUMBER
,SQL_OPCODE NUMBER
,SQL_OPNAME VARCHAR2(64)
,FORCE_MATCHING_SIGNATURE NUMBER
,TOP_LEVEL_SQL_ID VARCHAR2(13)
,TOP_LEVEL_SQL_OPCODE NUMBER
,SQL_PLAN_HASH_VALUE NUMBER
,SQL_PLAN_LINE_ID NUMBER
,SQL_PLAN_OPERATION VARCHAR2(64)
,SQL_PLAN_OPTIONS VARCHAR2(64)
,SQL_EXEC_ID NUMBER
,SQL_EXEC_START DATE
,PLSQL_ENTRY_OBJECT_ID NUMBER
,PLSQL_ENTRY_SUBPROGRAM_ID NUMBER
,PLSQL_OBJECT_ID NUMBER
,PLSQL_SUBPROGRAM_ID NUMBER
,QC_INSTANCE_ID NUMBER
,QC_SESSION_ID NUMBER
,QC_SESSION_SERIAL NUMBER
,PX_FLAGS NUMBER
,EVENT VARCHAR2(64)
,EVENT_ID NUMBER
,SEQ NUMBER
,P1TEXT VARCHAR2(64)
,P1 NUMBER
,P2TEXT VARCHAR2(64)
,P2 NUMBER
,P3TEXT VARCHAR2(64)
,P3 NUMBER
,WAIT_CLASS VARCHAR2(64)
,WAIT_CLASS_ID NUMBER
,WAIT_TIME NUMBER
,SESSION_STATE VARCHAR2(7)
,TIME_WAITED NUMBER
,BLOCKING_SESSION_STATUS VARCHAR2(11)
,BLOCKING_SESSION NUMBER
,BLOCKING_SESSION_SERIAL NUMBER
,BLOCKING_INST_ID NUMBER
,BLOCKING_HANGCHAIN_INFO VARCHAR2(1)
,CURRENT_OBJ NUMBER
,CURRENT_FILE NUMBER
,CURRENT_BLOCK NUMBER
,CURRENT_ROW NUMBER
,TOP_LEVEL_CALL NUMBER
,TOP_LEVEL_CALL_NAME VARCHAR2(64)
,CONSUMER_GROUP_ID NUMBER
,XID RAW(8)
,REMOTE_INSTANCE NUMBER
,TIME_MODEL NUMBER
,IN_CONNECTION_MGMT VARCHAR2(1)
,IN_PARSE VARCHAR2(1)
,IN_HARD_PARSE VARCHAR2(1)
,IN_SQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_RPC VARCHAR2(1)
,IN_PLSQL_COMPILATION VARCHAR2(1)
,IN_JAVA_EXECUTION VARCHAR2(1)
,IN_BIND VARCHAR2(1)
,IN_CURSOR_CLOSE VARCHAR2(1)
,IN_SEQUENCE_LOAD VARCHAR2(1)
,CAPTURE_OVERHEAD VARCHAR2(1)
,REPLAY_OVERHEAD VARCHAR2(1)
,IS_CAPTURED VARCHAR2(1)
,IS_REPLAYED VARCHAR2(1)
,SERVICE_HASH NUMBER
,PROGRAM VARCHAR2(64)
,MODULE VARCHAR2(64)
,ACTION VARCHAR2(64)
,CLIENT_ID VARCHAR2(64)
,MACHINE VARCHAR2(64)
,PORT NUMBER
,ECID VARCHAR2(64)
,TM_DELTA_TIME NUMBER
,TM_DELTA_CPU_TIME NUMBER
,TM_DELTA_DB_TIME NUMBER
,DELTA_TIME NUMBER
,DELTA_READ_IO_REQUESTS NUMBER
,DELTA_WRITE_IO_REQUESTS NUMBER
,DELTA_READ_IO_BYTES NUMBER
,DELTA_WRITE_IO_BYTES NUMBER
,DELTA_INTERCONNECT_IO_BYTES NUMBER
,PGA_ALLOCATED NUMBER
,TEMP_SPACE_ALLOCATED NUMBER
)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
 DEFAULT DIRECTORY temp
 ACCESS PARAMETERS 
 (RECORDS DELIMITED BY newline 
  DISABLE_DIRECTORY_LINK_CHECK 
  PREPROCESSOR exec_dir:'zcat' 
  FIELDS TERMINATED BY '<,>'
  MISSING FIELD VALUES ARE NULL
  REJECT ROWS WITH ALL NULL FIELDS
  NULLIF = BLANKS
(SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID
,SAMPLE_TIME CHAR(80) date_format TIMESTAMP MASK "YYYY-MM-DD/HH24:mi:ss.ff"
,SESSION_ID,SESSION_SERIAL,SESSION_TYPE
,FLAGS,USER_ID
,SQL_ID
,IS_SQLID_CURRENT
,SQL_CHILD_NUMBER
,SQL_OPCODE,SQL_OPNAME
,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE
,SQL_PLAN_HASH_VALUE,SQL_PLAN_LINE_ID
,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS
,SQL_EXEC_ID
,SQL_EXEC_START CHAR(80) date_format TIMESTAMP MASK "YYYY-MM-DD/HH24:mi:ss.ff"
,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID
,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,QC_SESSION_SERIAL,PX_FLAGS
,EVENT,EVENT_ID,SEQ
,P1TEXT,P1,P2TEXT,P2,P3TEXT,P3
,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME
,SESSION_STATE,TIME_WAITED
,BLOCKING_SESSION_STATUS,BLOCKING_SESSION,BLOCKING_SESSION_SERIAL
,BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO
,CURRENT_OBJ,CURRENT_FILE,CURRENT_BLOCK,CURRENT_ROW
,TOP_LEVEL_CALL,TOP_LEVEL_CALL_NAME
,CONSUMER_GROUP_ID,XID,REMOTE_INSTANCE,TIME_MODEL
,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION
,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD
,CAPTURE_OVERHEAD,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED
,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT,ECID
,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME
,DELTA_TIME
,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS
,DELTA_READ_IO_BYTES,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES
,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED
))
LOCATION ('dba_hist_active_sess_history.txt.gz')
) REJECT LIMIT 100
/
Now the external table can be queried in exactly the same way that you would query DBA_HIST_ACTIVE_SESS_HISTORY in order to profile DB time, although bear in mind than an external table can only be full scanned.
set timi on lines 200 pages 99
COLUMN force_matching_signature FORMAT 99999999999999999999
COLUMN min(sample_time) FORMAT a26
COLUMN max(sample_time) FORMAT a26
SELECT min(sample_time), max(sample_time), COUNT(*) samples
FROM scott.ash_load
/

MIN(SAMPLE_TIME)           MAX(SAMPLE_TIME)              SAMPLES
-------------------------- -------------------------- ----------
21-JUN-19 05.00.36.115 AM  21-JUL-19 01.00.00.823 PM     1213333

Wednesday, October 09, 2019

Purging SQL Statements and Execution Plans from AWR

A previous blog discussed the problem posed by old execution plans that were previously captured by AWR being used by dbms_xplan.display_awr() to describe current executions plans for the same execution plan but displaying the old costs.
Tanel Poder commented that there is a TIMESTAMP column on both V$SQL_PLAN and DBA_HIST_SQL_PLAN so you know when the plans were generated, but this is not used to purge plans from AWR.  It prompted me to look into purging of statements and plans in more detail.
There has been a long-standing problem of old plans not being purged at all from AWR.  There are various notes on the Oracle support site that refer to this behaviour.   However, it appears to have been addressed by at least Oracle 19c. 
There is no snapshot ID (SNAP_ID) on either DBA_HIST_SQLTEXT or DBA_HIST_SQL_PLAN but this column does exist on the underlying tables WRH$_SQLTEXT and WRH$_SQL_PLAN, where it is used to restrict the purge process to rows created before the retention period.  SQL statements and plans collected prior to the AWR retention limit, for which there are no current SQL statistics, are purged.  However, SQL statements are not purged if SQL baselines are present for that snapshot.
Even so, an old plan can still be retained in the AWR, if the same SQL statement is still submitted and the same plan is still generated by the optimiser, and the statement is captured by the AWR snapshot either because it is in the top statements, or the SQL is 'colored'.  dbms_xplan.display_awr() will show the plan with the original costs.  It will not be purged because there are SQL statistics more recent than the AWR retention limit.

Oracle Notes

There are various references on the Oracle support website to the lack of purging resulting in significant growth in the size of the AWR tables, including bugs in old releases.

dbms_workload_repository.purge_sql_details()

SQL statements and plans are automatically purged along with the AWR snapshots when they reach the retention limit.  There is also a procedure for manual purge.  Session trace including bind variable values of the purge procedure reveals the behaviour of the purge process.  It purges WRH$_SQLTEXT, WRH$_SQL_PLAN and WRH$_SQL_BIND_METADATA for SNAP_IDs in the range 0 to one less than the oldest snapshot not beyond the retention limit.  When purging manually, it is also possible to set a maximum number of rows to be purged from each table.
select min(snap_id) 
from   wrm$_snapshot  
where  dbid = :dbid 
and    (cast(SYSTIMESTAMP as TIMESTAMP) - end_interval_time) <= :retention
SQL Statements are not purged if there are either SQL statistics for it in AWR or a SQL Baseline on it.
delete from (
  select *       
  from  wrh$_sqltext  tab  
  where tab.dbid = :dbid 
  and :beg_snap <= tab.snap_id and tab.snap_id < :end_snap 
  and not exists (
    select 1
    from   wri$_sqltext_refcount  refc  
    where  refc.dbid = tab.dbid 
    and    refc.sql_id = tab.sql_id
    and    refc.con_dbid = tab.con_dbid and refc.ref_count > 0) 
  and not exists (
    select 1  
    from   wrh$_sqlstat  s         
    where  s.dbid = :dbid and s.con_dbid = tab.con_dbid 
    and    s.sql_id = tab.sql_id))
where rownum <= :bind1
Execution plans are not purged if there are SQL statistics for it in AWR.  Statistics older than the retention limit will have been purged by this stage.
delete from (
  select *       
  from  wrh$_sql_plan  tab  
  where tab.dbid = :dbid 
  and :beg_snap <= tab.snap_id and tab.snap_id < :end_snap 
  and not exists (
    select 1           
    from   wrh$_sqlstat s          
    where  s.dbid = :dbid and s.con_dbid = tab.con_dbid 
    and    s.sql_id = tab.sql_id))
where rownum <= :bind1
dbms_workload_repository.drop_snapshot_range() behaves similarly but does not need to test for SQL statistics because it is deleting them in that range.

Comparison of Optimizer Cost of Captured Plans with Optimizer Cost of SQL Statistics

It is possible to compare the optimizer cost on SQL statistics with the optimizer cost of the captured plan and see where they have diverged.  In this example, I am looking for an absolute difference in cost of 10 and also of at least 10%.
set lines 140 pages 99
col costpctdiff heading 'Cost|%Diff' format 99999 
col costdiff heading 'Cost|Diff' format 99999999 
col plan_hash_value heading 'SQL Plan|Hash Value'
col child_number heading 'Child|No.' format 9999
col inst_id heading 'Inst|ID' format 999
col hcost heading 'AWR|Cost' format 9999999
col ccost heading 'Cursor|Cost' format 9999999
col htimestamp heading 'AWR|Timestamp'
col ctimestamp heading 'Cursor|Timestamp'
col end_interval_time format a26
col snap_id heading 'Snap|ID' format 99999
col awr_cost format 99999999
col optimizer_Cost heading 'Opt.|Cost' format 99999999
col optimizer_env_hash_value heading 'Opt. Env.|Hash Value'
col num_stats heading 'Num|Stats' format 9999
alter session set nls_date_format = 'hh24:mi:ss dd.mm.yy';
break on plan_hash_value skip 1 on sql_id on dbid
ttitle 'compare AWR/recent plan costs'

with h as ( /*captured plan outside retention limit*/
select  p.dbid, p.sql_id, p.plan_hash_Value, max(cost) cost
,       max(p.timestamp) timestamp
from    dba_hist_sql_plan p 
,       dba_hist_wr_control c
where   p.dbid = c.dbid
and     p.cost>0
and     (p.object_owner != 'SYS' OR p.object_owner IS NULL)  --omit SYS owned objects
and     p.timestamp < sysdate-c.retention 
group by p.dbid, p.sql_id, p.plan_hash_value
), s as ( /*SQL statistics*/
select  t.dbid, t.sql_id, t.plan_hash_value, t.optimizer_env_hash_value
,       t.optimizer_cost
,       MIN(t.snap_id) snap_id
,       MIN(s.end_interval_time) end_interval_time
,       COUNT(*) num_stats
from    dba_hist_snapshot s
,       dba_hist_sqlstat t
where   s.dbid = t.dbid
and     s.snap_id = t.snap_id
and     t.optimizer_cost > 0
GROUP BY t.dbid, t.sql_id, t.plan_hash_value, t.optimizer_env_hash_value
,       t.optimizer_cost
), x as (
Select  NVL(h.dbid,s.dbid) dbid
,       NVL(h.sql_id,s.sql_id) sql_id
,       NVL(h.plan_hash_value,s.plan_hash_value) plan_hash_value
,       h.cost hcost, h.timestamp htimestamp
,       s.snap_id, s.end_interval_time
,       s.optimizer_env_hash_value, s.optimizer_cost
,       s.num_stats
,       s.optimizer_cost-h.cost costdiff
,       100*s.optimizer_cost/NULLIF(h.cost,0) costpctdiff
From    h join s
on      h.plan_hash_value = s.plan_hash_value
and     h.sql_id = s.sql_id
and     h.dbid = s.dbid
), y as (
SELECT  x.*
,       MAX(ABS(costpctdiff)) OVER (PARTITION BY dbid, sql_id, plan_hash_value) maxcostpctdiff
,       MAX(ABS(costdiff)) OVER (PARTITION BY dbid, sql_id, plan_hash_value) maxcostabsdiff
FROM    x
)
SELECT  dbid, sql_id, plan_hash_value, hcost, htimestamp
,       snap_id, end_interval_time, optimizer_env_hash_value, optimizer_cost, num_stats, costdiff, costpctdiff
FROM    y
WHERE   maxcostpctdiff>=10
And     maxcostabsdiff>=10
order by plan_hash_value,sql_id,end_interval_time
/
break on report
ttitle off
We can see that according to some statement statistics, for the same execution plan, costs vary over time.  They can go up or down.  This can be caused by different object statistics or different bind variables.  It is also perfectly possible to have different cursors produce the same execution plan but with a different cost.
                                                       compare AWR/recent plan costs

                           SQL Plan      AWR AWR                 Snap                             Opt. Env.      Opt.   Num      Cost   Cost
      DBID SQL_ID        Hash Value     Cost Timestamp             ID END_INTERVAL_TIME          Hash Value      Cost Stats      Diff  %Diff
---------- ------------- ---------- -------- ----------------- ------ -------------------------- ---------- --------- ----- --------- ------
…
3719366004 1852wpfdvbbdb  428098007     5759 06:04:57 07.01.19  66189 18-JUL-19 01.00.29.158 PM  2275008064     15760    22     10001    274
                                        5759 06:04:57 07.01.19  66191 18-JUL-19 02.00.30.019 PM  2275008064      6434    40       675    112
                                        5759 06:04:57 07.01.19  66196 18-JUL-19 04.30.41.024 PM  2275008064      6435    24       676    112
                                        5759 06:04:57 07.01.19  66197 18-JUL-19 05.00.26.793 PM  2275008064      6448    10       689    112
                                        5759 06:04:57 07.01.19  66217 19-JUL-19 03.00.09.245 AM  2275008064      6444     2       685    112
3719366004 dnxpk0a7tmk2c                5761 14:25:15 07.01.19  66247 19-JUL-19 06.00.59.483 PM  2275008064      4258     2     -1503     74
…
3719366004 882x55gh06a0k 4034560521   321903 09:02:37 01.04.19  66208 18-JUL-19 10.30.45.355 PM  2275008064    471900     6    149997    147
                                      321903 09:02:37 01.04.19  66285 20-JUL-19 01.00.01.061 PM  2275008064    471848     2    149945    147
                                      321903 09:02:37 01.04.19  66326 21-JUL-19 09.30.18.514 AM  2275008064    473557     2    151654    147
…
3719366004 fgyja8zfdf4s6 4057647380      188 22:02:32 11.02.19  66214 19-JUL-19 01.30.10.479 AM    50414560       496     4       308    264

           9ud99dpmkdtq6 4059558786        3 03:56:20 01.11.18  66191 18-JUL-19 02.00.30.019 PM    50414560       170   226       167   5667

           3d6ph39wbq0zq 4078589457    62946 06:23:42 01.05.18  66270 20-JUL-19 05.30.15.218 AM  3570113222    196673     2    133727    312

           9ud99dpmkdtq6 4082397478        3 20:36:53 03.10.18  66189 18-JUL-19 01.00.29.158 PM    50414560      2296   174      2293  76533

           6rkn5ty3mpdg8 4086831979     1096 11:54:01 22.01.19  66201 18-JUL-19 07.00.07.631 PM    50414560       549    16      -547     50
                                        1096 11:54:01 22.01.19  66212 19-JUL-19 12.30.00.506 AM    50414560      6055    16      4959    552
…

Summary

Old execution plans will not be purged from AWR if there are recent SQL statistics for the same SQL ID and execution plan hash value.  You will see the old plan costs if you extract the plan with
dbms_xplan.display_awr().

Is that a problem?
  • This does not have an impact on the behaviour of the database.  The data in AWR tells you what happened in the past.
  • I think it is most important to know how far you can trust display_awr().  The plan can be used as a guide to the path of execution, but not to the amount of work done.  It will show you the correct operations in the execution plan, but costs, bind variables, partition IDs and anything that does not contribute to the plan hash value should be approached with caution.  This is why I often profile DB time for a SQL statement by SQL_PLAN_LINE_ID so I can see how much time was spent in each operation.
Oracle Doc ID 1478615.1 implies that it is permitted to manually purge specific plans.  A subsequent AWR snapshot might then capture a new version of the same execution plan.  However, there are potential problems with this approach.
You may have an example where the old estimates for a plan don't make sense for a recent execution, but if you replace the plan with an updated one to get recent estimates that may not make sense for some activity in the past.
You can have different child cursors with different costs for the same execution plan.  Therefore, it is effectively arbitrary as to which plan would be captured into AWR, so there is no guarantee that you will get a more representative plan.  It is not even certain that costs for one cursor make sense for another.

I am not sure that there is any material benefit that is worth the effort of implementing something to additionally purge old execution plans.  I would welcome comments and opinions on this subject.