Friday, September 06, 2019

Working with AWR: Old Statistics and Costs in Current Execution Plans

This is a blog about a little trap, into which it is easy to fall when analysing performance problems using ASH data and execution plans from AWR.
I will illustrate it with a real example taken from a production PeopleSoft system. A number of sessions were waiting on blocking locks.  I used ASH data to find statements running while holding a blocking lock.  That led me to a particular SQL ID and plan have value.
------------- ------------------- ----------
5st32un4a2y92          2494504609      10670
652mx4tffq415          1888029394       7030
artqgxug4z0f1             8450529        580
7qxdrwcn4yzhh          3723363341        270
So, I used DBMS_XPLAN.DISPLAY_AWR to extract the execution plan from the AWR repository.
SELECT * FROM table(dbms_xplan.display_awr('5st32un4a2y92',2494504609,NULL,'ADVANCED'));
I can see the statement, the execution plan, the costs of each operation in the plan, and the value of the bind variable. This is a statement that has consumed a lot of execution time, it isn't executing that frequently, yet the cost of the statement is quite low (only 22). That, in itself, does not mean there is a problem here, but it is perhaps a suggestion of an inconsistency.
SQL_ID 5st32un4a2y92

Plan hash value: 2494504609

| Id  | Operation            | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT     |             |       |       |    22 (100)|          |
|   1 |  INDEX FAST FULL SCAN| PS_CDM_LIST |     1 |     5 |    22  (10)| 00:00:01 |
Peeked Binds (identified by position):
   1 - :1 (NUMBER): 17776
Familiarity with the application is an advantage here. The value of CONTENTID comes from a sequence number (albeit generated from a table, not an Oracle sequence because this is PeopleSoft). The system has been running for a while, and Process Instances and Context IDs were well in excess of 1,000,000, so why is the bind variable only 17776? That is another alarm bell.
I can extract the current execution plan from the library cache, if the statement is in memory, using DBMS_XPLAN.DISPLAY_CURSOR().
SELECT * FROM table(dbms_xplan.display_cursor('5st32un4a2y92',NULL,'ADVANCED')); 

Plan hash value: 2494504609

| Id  | Operation            | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT     |             |     1 |     6 |  3178   (9)| 00:00:05 |
|*  1 |  INDEX FAST FULL SCAN| PS_CDM_LIST |     1 |     6 |  3178   (9)| 00:00:05 |
Peeked Binds (identified by position):
   1 - :1 (NUMBER): 10897657

Predicate Information (identified by operation id):
   1 - filter("CONTENTID"=TO_NUMBER(:1))
DBMS_XPLAN.DISPLAY_CURSOR() shows the current bind variable value. It is immediately obvious it has moved on since the plan was captured by AWR. If the statement is not in memory, I could generate a new execution plan using EXPLAIN PLAN FOR, that reparses the statement, and then extract the execution plan with DBMS_XPLAN.DISPLAY(). However, this will not give me the bind variable value.


select * from table(dbms_xplan.display(null,null,'ADVANCED'));
The execution plan has the same plan hash value, so it is the same plan. However, the cost has increased significantly (from 22 to 3178). This reflects how the table has grown over time, and that is the problem. The full scan of the index was fast when the table was small, but as the table has grown over time, the scan is also taking longer. By the time I came to look at it, it was quite significant.
In this case, we created a new index on PS_CDM_LIST leading on CONTENTID to satisfy this query. The result was to improve the performance of this statement, and a reduce in time spent holding blocking locks which has further benefits.

What is going on here? 

This SQL statement is a regular part of the application. The SQL ID doesn't change because it uses a bind variable. The execution plan hasn't changed either because the optimizer can't find a better option. The execution plan we obtained from AWR was the first such plan captured for that combination of SQL ID and plan hash value. AWR may, in fact, is very likely to, have captured that plan again subsequently. However, it did not update the plan in AWR, and the old plan was never purged.
The original cost (22) was the cost then, not now. The value of the bind variable (17776) was the value then not now!
So how do you know you are looking at a plan with old costs and bind variables? You don't.  There is nothing explicit to warn you to go and check.
However, if things are not making sense, see if the cost for the same plan is close to the costs stored in AWR.
If you use EXPLAIN PLAN FOR you may get a different plan, in which case you will need to use the hints in the outline section of the execution plan (not shown above) to force it to produce the same execution plan before you can compare costs.

1 comment :

Tanel said...

Nice one! There's a TIMESTAMP column in DBA_HIST_SQL_PLAN (and v$sql_plan), meaning "Date and time when the execution plan was generated" according to the docs, so at least you'd know when the plan was built.