Tuesday, January 21, 2020

Analysing Database Time with Active Session History for Statements with Online Optimizer Statistics Gathering Operations

(Updated 30.1.2020) I have been looking into the performance of online statistics collection. This feature was introduced in 12c.  When statistics are collected online there is an extra OPTIMIZER STATISTICS GATHERING operation in the execution plan. However, I have noticed that the presence or absence of this operation does not change the hash value of the plan.  I am grateful to RobK for his comment pointing out that the full plan hash value does differ.
This has consequences for profiling DB time by execution plan line and then describing that line from a captured plan.

OPTIMIZER STATISTICS GATHERING Operation

When statistics are collected online during either a create-table-as-select operation or the initial direct-path insert into a new segment.  Below, I have different statements, whose execution plans have the same plan hash value, but actually differ. So, the differences are in areas that do not contribute to the plan hash value.
  • The first statement performs online statistics gathering, and so the plan includes the OPTIMIZER STATISTICS GATHERING operation, the second does not.
  • Note also that the statements insert into different tables, and that does not alter the plan hash value either. However, if the data was queried from different tables that would have produced a different plan hash value.
SQL_ID  c2dy3rmnqp7d7, child number 0
-------------------------------------
insert /*+APPEND*/ into T2 select * from t1

Plan hash value: 1069440229
-----------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |      |       |       |   187K(100)|          |
|   1 |  LOAD AS SELECT                  | T2   |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |   100M|  4291M|   187K  (1)| 00:00:08 |
|   3 |    TABLE ACCESS STORAGE FULL     | T1   |   100M|  4291M|   187K  (1)| 00:00:08 |
-----------------------------------------------------------------------------------------
SQL_ID  5kr7cpvnmdwat, child number 0
-------------------------------------
insert /*+APPEND NO_GATHER_OPTIMIZER_STATISTICS*/ into T4 select * from t1

Plan hash value: 1069440229
-----------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | INSERT STATEMENT           |      |       |       |   187K(100)|          |
|   1 |  LOAD AS SELECT            | T4   |       |       |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| T1   |   100M|  4291M|   187K  (1)| 00:00:08 |
-----------------------------------------------------------------------------------
I find that it is often useful to profile database time from DBA_HIST_ACTIVE_SESS_HISTORY (or v$active_session_history) by line in the execution plan, in order to see how much time was consumed by the different operations. I can then join the profile to DBA_HIST_SQL_PLAN (or v$sql_plan) to see what is the operation for each line. So long as I also join these tables by SQL_ID, the answer I get will be correct, but I may not always get an answer.
column inst_id heading 'Inst|Id' format 99
column sql_plan_line_id heading 'SQL Plan|Line ID'
column sql_plan_hash_value heading 'SQL Plan|Hash Value'
column ash_secs heading 'ASH|Secs' format 999
break on sql_id skip 1
with h as (
SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
,      SUM(10) ash_secs
FROM   dba_hist_Active_Sess_history h
WHERE  h.sql_plan_hash_value = 1069440229
AND    h.sql_id IN('c2dy3rmnqp7d7','5kr7cpvnmdwat','g7awpb71jbup1')
GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
)
SELECT h.*, p.operation
FROM   h
  LEFT OUTER JOIN dba_hist_sql_plan p
  ON p.dbid = h.dbid
  and p.sql_id = h.sql_id
  AND p.plan_hash_value = h.sql_plan_hash_value
  AND p.id = h.sql_plan_line_id
ORDER BY 1,2,3
/
You can see that while all the plans have the same hash value, the plan that included the extra OPTIMIZER STATISTICS GATHERING operation produced a different full plan hash value.
If the plan was not captured into AWR or is no longer in the library cache (as is the case for SQL ID g7awpb71jbup1), I don't get a description of the operations in the plan.
                           SQL Plan   SQL Plan SQL Full Plan  ASH
      DBID SQL_ID           Line ID Hash Value    Hash Value Secs OPERATION
---------- ------------- ---------- ---------- ------------- ---- --------------------------------
1278460406 5kr7cpvnmdwat          1 1069440229    2621293920   90 LOAD AS SELECT
1278460406                        2 1069440229    2621293920   30 TABLE ACCESS

1278460406 c2dy3rmnqp7d7          1 1069440229    3552282137  170 LOAD AS SELECT
1278460406                        2 1069440229    3552282137   10 OPTIMIZER STATISTICS GATHERING
1278460406                        3 1069440229    3552282137   60 TABLE ACCESS
1278460406                          1069440229    3552282137   20

1278460406 g7awpb71jbup1          1 1069440229    2621293920   90
1278460406                        2 1069440229    2621293920   20
1278460406                          1069440229    2621293920   10
Normally, I would look for another SQL_ID that produced the same plan hash value.  FULL_PLAN_HASH_VALUE is recorded on V$SQL_PLAN, so I can join that to V$ACTIVE_SESSION_HISTORY.
WITH h as (
SELECT h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value, h.sql_full_plan_hash_value
,      SUM(1) ash_secs
FROM   v$active_session_history h
WHERE  h.sql_plan_hash_value = 1069440229
AND    h.sql_id IN('c2dy3rmnqp7d7','g7awpb71jbup1','5kr7cpvnmdwat')
GROUP BY h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value, h.sql_full_plan_hash_value
), p as (
SELECT DISTINCT plan_hash_value, full_plan_hash_value, id, operation
from v$sql_plan
)
SELECT h.*, p.operation
FROM   h
  LEFT OUTER JOIN p
  ON p.plan_hash_value = h.sql_plan_hash_value
  AND p.full_plan_hash_value = h.sql_full_plan_hash_value
  AND p.id = h.sql_plan_line_id
ORDER BY 1,2,3
/
                SQL Plan   SQL Plan SQL Full Plan  ASH
SQL_ID           Line ID Hash Value    Hash Value Secs OPERATION
------------- ---------- ---------- ------------- ---- --------------------------------
5kr7cpvnmdwat          0 1069440229    2621293920    2 INSERT STATEMENT
                       1 1069440229    2621293920   98 LOAD AS SELECT
                       2 1069440229    2621293920   24 TABLE ACCESS

c2dy3rmnqp7d7          0 1069440229    3552282137    6 INSERT STATEMENT
                       1 1069440229    3552282137  137 LOAD AS SELECT
                       2 1069440229    3552282137   46 OPTIMIZER STATISTICS GATHERING
                       3 1069440229    3552282137   74 TABLE ACCESS
                         1069440229    3552282137    1

g7awpb71jbup1          0 1069440229    2621293920    3 INSERT STATEMENT
                       1 1069440229    2621293920   93 LOAD AS SELECT
                       2 1069440229    2621293920   28 TABLE ACCESS
However, while the full plan hash value is recorded in the ASH data in AWR (in DBA_HIST_ACTIVE_SESS_HISTORY), it is not on the plan history (DBA_HIST_SQL_PLAN).  So, for an execution plan that only sometimes includes online statistics gathering, the operations may not match correctly because the OPTIMIZER STATISTICS GATHERING operation changes the line IDs.
WITH h as (
SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value, h.sql_full_plan_hash_value
,      SUM(10) ash_secs
FROM   dba_hist_Active_Sess_history h
WHERE  h.sql_plan_hash_value = 1069440229
AND    h.sql_id IN('c2dy3rmnqp7d7','5kr7cpvnmdwat','g7awpb71jbup1')
GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value, h.sql_full_plan_hash_value
), p as (
SELECT DISTINCT dbid, plan_hash_value, id, operation
from dba_hist_sql_plan
)
SELECT h.*, p.operation
FROM   h
  LEFT OUTER JOIN p
  ON p.dbid = h.dbid
  AND p.plan_hash_value = h.sql_plan_hash_value
  AND p.id = h.sql_plan_line_id
ORDER BY 1,2,3
/
If I just join the ASH profile to a distinct list of ID and operation for the same plan hash value but matching any SQL_ID, I can get duplicate rows returned, starting at the line with the OPTIMIZER STATISTICS GATHERING operation because I have different plans with the same plan hash value.
                           SQL Plan   SQL Plan SQL Full Plan  ASH
      DBID SQL_ID           Line ID Hash Value    Hash Value Secs OPERATION
---------- ------------- ---------- ---------- ------------- ---- --------------------------------
1278460406 5kr7cpvnmdwat          1 1069440229    2621293920   90 LOAD AS SELECT
1278460406                        2 1069440229    2621293920   30 OPTIMIZER STATISTICS GATHERING
1278460406                        2 1069440229    2621293920   30 TABLE ACCESS

1278460406 c2dy3rmnqp7d7          1 1069440229    3552282137  170 LOAD AS SELECT
1278460406                        2 1069440229    3552282137   10 TABLE ACCESS
1278460406                        2 1069440229    3552282137   10 OPTIMIZER STATISTICS GATHERING
1278460406                        3 1069440229    3552282137   60 TABLE ACCESS
1278460406                          1069440229    3552282137   20

...
To mitigate this problem, in the following SQL Query, I look for the plan in the shared pool if it is available, and failing that in AWR where I check that the maximum plan line ID for which I have ASH data matches the maximum line ID (i.e. the number of lines) in any alternative plan with the same hash value.
WITH h as (
SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_value, h.sql_full_plan_hash_value
,      SUM(10) ash_secs
FROM   dba_hist_Active_Sess_history h
WHERE  h.sql_plan_hash_value = 1069440229
AND    h.sql_id IN('c2dy3rmnqp7d7','5kr7cpvnmdwat','g7awpb71jbup1')
GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_value, h.sql_full_plan_hash_value
), x as (
SELECT h.*
,      MAX(sql_plan_line_id) OVER (PARTITION BY h.dbid, h.sql_id, h.sql_plan_hash_value) plan_lines
,      p1.operation
FROM   h
  LEFT OUTER JOIN dba_hist_sql_plan p1
  ON  p1.dbid = h.dbid
  AND p1.sql_id = h.sql_id
  AND p1.plan_hash_value = h.sql_plan_hash_value
  AND p1.id = h.sql_plan_line_id
)
SELECT x.*
, COALESCE(
   (SELECT p2.operation
    FROM   gv$sql_plan p2
    WHERE  p2.plan_hash_value = x.sql_plan_hash_value
    AND    p2.full_plan_hash_value = x.sql_full_plan_hash_value
    AND    p2.id = x.sql_plan_line_id
    AND    rownum = 1)
  ,(SELECT p3.operation
    FROM   dba_hist_sql_plan p3
    WHERE  p3.dbid = x.dbid
    AND    p3.plan_hash_value = x.sql_plan_hash_value
    AND    p3.id = x.sql_plan_line_id
    AND    p3.sql_id IN(
                    SELECT p.sql_id
                    FROM   dba_hist_sql_plan p
                    WHERE  p.dbid = x.dbid
                    AND    p.plan_hash_value = x.sql_plan_hash_value
                    GROUP BY p.dbid, p.sql_id
                    HAVING MAX(p.id) = x.plan_lines)
    AND    rownum = 1)
   ) operation2
FROM   x
ORDER BY 1,2,3
/
Now, I get an operation description for every line ID (if the same plan was gathered for a different SQL_ID).
                           SQL Plan   SQL Plan SQL Full Plan  ASH
      DBID SQL_ID           Line ID Hash Value    Hash Value Secs PLAN_LINES OPERATION                      OPERATION2
---------- ------------- ---------- ---------- ------------- ---- ---------- ------------------------------ ------------------------------
1278460406 5kr7cpvnmdwat          1 1069440229    2621293920   90          2 LOAD AS SELECT                 LOAD AS SELECT
1278460406                        2 1069440229    2621293920   30          2 TABLE ACCESS                   TABLE ACCESS

1278460406 c2dy3rmnqp7d7          1 1069440229    3552282137  170          3 LOAD AS SELECT                 LOAD AS SELECT
1278460406                        2 1069440229    3552282137   10          3 OPTIMIZER STATISTICS GATHERING OPTIMIZER STATISTICS GATHERING
1278460406                        3 1069440229    3552282137   60          3 TABLE ACCESS                   TABLE ACCESS
1278460406                          1069440229    3552282137   20          3

1278460406 g7awpb71jbup1          1 1069440229    2621293920   90          2                                LOAD AS SELECT
1278460406                        2 1069440229    2621293920   20          2                                TABLE ACCESS
1278460406                          1069440229    2621293920   10          2
However, this approach, while better, is still not perfect. I may not have sufficient DB time for the last line in the execution plan to be sampled, and therefore I may not choose a valid alternative plan.

Autonomous & Cloud Databases

Automatic online statistics gathering is becoming a more common occurrence.
  • In the Autonomous Data Warehouse, Oracle has set _optimizer_gather_stats_on_load_all=TRUE, so statistics are collected on every direct-path insert. 
  • From 19c, on Engineered Systems (both in the cloud and on-premises), Real-Time statistics are collected during conventional DML (on inserts, updates and some deletes), also using the OPTIMIZER STATISTICS GATHERING operation. Again, the presence or absence of this operation does not affect the execution plan hash value.

1 comment :

vau said...

Hello!

In 12c there is a new column in all the relevant tables: FULL_PLAN_HASH_VALUE.
It is not present in DBMS_XPLAN output, but that value is different for the different plans. (I did a quick test)
Use it. It will solve the problem you described in this post.


v$sql.PLAN_HASH_VALUE:
Numeric representation of the **current** SQL plan for this cursor. Comparing one PLAN_HASH_VALUE to another easily identifies whether or not two plans are the same (rather than comparing the two plans line by line).

v$sql.FULL_PLAN_HASH_VALUE:
Numeric representation of the **complete** SQL plan for this cursor. Comparing one FULL_PLAN_HASH_VALUE to another easily identifies whether or not two plans are the same (rather than comparing the two plans line by line). Note that the FULL_PLAN_HASH_VALUE cannot be compared across databases releases. It is not backward compatible.


Cheers,
RobK