UKOUG Tech16 Conference

Friday, September 16, 2016

One of my Favourite Database Things: DBMS_APPLICATION_INFO

Notes for my slot in session UGF2630: EOUC Database ACES Share Their Favorite Database Things: Part I – OpenWorld , Sunday 18th September 2016
DBMS_APPLICATION_INFO is probably one of the most undervalued and without any doubt the most underused of all the packages supplied with the database by Oracle. On the face it, it is a very simple package that does a few very simple things, but it is not possible to overstate its significance.

Oracle 7.3.3

It has also been around for a long time. This package was first documented in Oracle 7.3.3. "Application developers can use the DBMS_APPLICATION_INFO package to record the name of the executing module or transaction in the database for use later when tracking the performance of various modules… System administrators can also use this information to track resource usage by module. When an application registers with the database, its name and actions are recorded in the V$SESSION and V$SQLAREA views. Your applications should set the name of the module and name of the action automatically each time a user enters that module. The module name could be the name of a form in an Oracle Forms application, or the name of the code segment in an Oracle Precompilers application. The action name should usually be the name or description of the current transaction within a module." – Oracle 7 Tuning, release 7.3.3 ©Oracle 1997, Chapter 23 Registering Applications*

How to Register a Session

Make a simple call to DBMS_APPLICATION_INFO, and you will be able to see MODULE and ACTION in your session information,
EXECUTE dbms_application_info.set_module(module_name=>'MYMODULE',action_name=>'MYACTION');
MODULE and ACTION are just character strings. You can put anything you like in there, but I recommend that the values are meaningful to your application, consistent and predicable.
select sid, serial#, username, module, action
from v$session
where sid IN(select distinct sid from v$mystat)
/
       SID    SERIAL# USERNAME MODULE               ACTION
---------- ---------- -------- -------------------- --------------------------------
       367       2515 SCOTT    MYMODULE             MYACTION
And you can also identify SQL for that module in v$sqlarea.
select sql_id, module, action, sql_text
from v$sqlarea
where module = 'MYMODULE'
/
SQL_ID        MODULE               ACTION
------------- -------------------- --------------------------------
SQL_TEXT
---------------------------------------------------------------------------------------------
17f888us1tbk2 MYMODULE             MYACTION
select sid, serial#, username, module, action from v$session where sid IN(select distinct sid from v$mystat)

9bsqpcnygpgp7 MYMODULE             MYACTION
select sql_id, module, action, sql_text from v$sqlarea where module = 'MYMODULE'
This hasn't changed since 1997 (except the strings have got longer). Today, there are a few more dynamic performance views where module and action are recorded.

Tracing

When we did performance tuning back in the 1990s, all we had was the buffer cache hit ratio (joke) and SQL trace (an even older feature – introduced in Oracle 6, augmented by the wait interface in 7.0.12). When you registered a session with DBMS_APPLICAITON_INFO, it was recorded in the trace. So you knew which part of the trace refers to which value of module and action.
This following is taken from a 12c trace.  You can see where I called DBMS_APPLICATION_INFO to set the module and action.
Trace file /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_8795.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
…
*** 2016-08-11 11:17:42.803
*** SESSION ID:(367.2515) 2016-08-11 11:17:42.803
*** CLIENT ID:() 2016-08-11 11:17:42.803
*** SERVICE NAME:(orcl) 2016-08-11 11:17:42.803
*** MODULE NAME:() 2016-08-11 11:17:42.803
*** CLIENT DRIVER:(SQL*PLUS) 2016-08-11 11:17:42.803
*** ACTION NAME:() 2016-08-11 11:17:42.803
*** CONTAINER ID:(3) 2016-08-11 11:17:42.803
…
PARSING IN CURSOR #140183328563984 len=95 dep=0 uid=156 oct=47 lid=156 tim=79763572382 hv=611933126 ad='695370e0' sqlid='dbvj2xsk7kqy6'
BEGIN dbms_application_info.set_module(module_name=>'MYMODULE',action_name=>'MYACTION'); END;
*** MODULE NAME:(MYMODULE) 2016-08-11 11:17:42.972
*** ACTION NAME:(MYACTION) 2016-08-11 11:17:42.972
You can use the trcsess utility to extract part of a trace by module and action
trcsess output=MYACTION.trc module=MYMODULE cdb1_ora_8795.trc
and this is the start of the resulting file
*** [ Unix process pid: 8795 ]
*** 2016-08-11 11:17:42.972
*** 2016-08-11 11:17:42.972
…
And again, this hasn't changed since it was introduced

Client ID

There is also a function in DBMS_APPLICATION_INFO to set a client identifier. Again it is just a string. The idea is that you can associate an operator with a database session.
dbms_application_info.set_client_info(client_info IN VARCHAR2)
For example, PeopleSoft makes practical use of this to get a user ID set by the application into database triggers that perform application auditing. It uses another procedure to read the client identifier.
dbms_application_info.read_client_info(client_info OUT VARCHAR2)
Although, I find it is often more interesting to know where in an application a SQL came from, than which operator activated the code.

Confession

When I first looked at this feature back in the late 1990s, I struggled to see the value of MODULE and ACTION, but by then I was working with PeopleSoft, a packaged application that is still with us, and back then it did not register with the database. I used triggers to enable trace for specific processes, and then I would profile the whole trace file.

Oracle 10g

Two new things in Oracle 10g made DBMS_APPLICATION_INFO very much more interesting that it had seemed to be.
  • DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE 
  • Active Session History 

DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE 

From Oracle 10g, it became possible to specify values of MODULE and/or ACTION for which the database would automatically enable SQL Trace. It was like setting a watch point in a debugger. Trace would be disabled when the module/action changed to something else.
BEGIN
 sys.dbms_monitor.serv_mod_act_trace_enable
 (service_name=>'P1PPP'
 ,module_name =>'RECV_PO'
 ,waits=>TRUE
 ,binds=>TRUE);
END;
/
So, if your application set module and action to meaningful, predictable values, you now had an easy way of enabling SQL trace (See also Enabling Oracle Extended SQL Trace by Module and Action). This is why I generally avoid putting the values of variables into module or action because it makes it difficult to enable trace by this method, unless perhaps the variable has relatively few known values.

Active Session History 

ASH was introduced in 10g. It is only available on Enterprise Edition and it requires the Diagnostics Pack licence. Every second, the database records the status of active database sessions in a buffer in memory. Every 10th sample is persisted to a table in the Automatic Workload Repository (AWR). Module and Action are collected as a part of that. Hence, you can analyse the ASH data with SQL Query.
WITH x AS (
SELECT  h.*
,  CASE WHEN h.program LIKE 'PS%' THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1) ELSE h.program END as program2
FROM dba_hist_active_Sess_history h
), y AS (
SELECT  x.program2, x.module, x.action, sum(10) ash_Secs
FROM x 
GROUP BY x.module, x.action, x.program2
)
SELECT * FROM y
WHERE  ash_secs >= 300
ORDER BY  ash_Secs desc
/
This query profiles the amount of database time by program, module and action. We can see which modules and actions taken the most database time. Then we could look for specific SQL statements associated with a particular module.
                                                                                     ASH
PROGRAM2                 MODULE                      ACTION                         Secs
------------------------ --------------------------- -------------------------- -------- 
PSAPPSRV                 RAE_EVENT_DETAIL            RAE_EVENT_DETAIL               5800
PSAPPSRV                 RC_CASE                     RC_CASE                        2100
PSAPPSRV                 RC_CASE                     RC_CASE_TSK                     780
PSAPPSRV                 RA_CM_HOME_GRD              RA_CM_HOME_GRD                  630
PSAPPSRV                 XX_PGM_CASE                 XX_PGM_CASE                     600
PSAPPSRV                 RB_TSK                      RB_TSK                          440
oracle@XXX1 (LGWR)                                                                   340
oracle@XXX2 (LGWR)                                                                   330
…
Module and Action are also visible in:
  • Enterprise Manager: This example shows two actions for a specific module
  • AWR reports: The profile of SQL statements reports the module.
Elapsed Time (s)
Execs
Elapsed Time per Exec (s)
%Total
%CPU
%IO
SQL Id
SQL Module
SQL Text
305.11
1
305.11
10.62
58.94
0.97
dcxx3tvxsm4az
RA_CM_HOME_GRD
SELECT BUSINESS_UNIT, RA_CAMPA...
110.51
342
0.32
3.85
47.67
0.00
bvbbkrnyjbyph
RAE_EVENT_DETAIL
SELECT PRCSINSTANCE, JOBINSTAN...
41.42
340
0.12
1.44
48.73
0.00
cb3p0zbpuw2ha
RAE_EVENT_DETAIL
SELECT BUSINESS_UNIT, RA_CAMPA...
31.74
4
7.94
1.10
57.65
0.00
7qh8s1qgz0axd
RBQ_BOSRCH_QCREATE
SELECT * FROM ( SELECT DISTINC...
25.49
812
0.03
0.89
51.36
0.29
0fj122xjvgpuc

SELECT /* OPT_DYN_SAMP */ /*+ ...
24.49
1,782
0.01
0.85
49.04
0.00
9d8rmyu6hzh5v

SELECT /* OPT_DYN_SAMP */ /*+ ...
22.50
4,232
0.01
0.78
48.38
0.00
16fx1qf2ujdrw

SELECT /* OPT_DYN_SAMP */ /*+ ...
21.63
1,543
0.01
0.75
50.30
0.00
8nzp1phr63835
XX_PGM_CASE
SELECT PROVIDER_GRP_ID, PERSON...
19.40
1
19.40
0.68
34.48
0.00
1atzwyuw0mqwn
RBQ_BOSRCH_QCREATE
SELECT * FROM ( SELECT DISTINC...
19.31
301,677
0.00
0.67
46.24
0.00
2t98byasd6zzk
RC_CASE
SELECT EOCF_LIB_TERM_ID, EOCF_...

Other Features

  • Resource Manager: policies can be specified for a module and/or action. Which could be interesting, but I haven't played with it personally.  

How to Instrument PL/SQL 

I am not a developer, and so it is not my day job to write PL/SQL. However, when I do, I instrument the code by adding calls to DBMS_APPLICATION_INFO as follows:
  • I store the current values of module and action at the start of each procedure in local variables.
  • I set the action to the name of the procedure.  I may set other values during the procedure.
  • The original values are reinstated at every exit point from the procedure.
  • I have a constant with the name of the procedure. I use this to set the value of module.
k_module CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;
…
PROCEDURE my_procedure IS
  l_module VARCHAR2(48);
  l_action VARCHAR2(32);
BEGIN
  dbms_application_info.read_module(module_name=>l_module
                                   ,action_name=>l_action);
  dbms_application_info.set_module(module_name=>k_module
                                  ,action_name=>'MY_PROCEDURE');
/*application code goes here*/
…
  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);
EXCEPTION
  WHEN … THEN
    dbms_application_info.set_module(module_name=>l_module
                                    ,action_name=>l_action);
    RAISE / EXIT
…
END my_procedure;
I would characterise this as an investment rather than an overhead during development, in the same way debug code is an investment that will pay dividends later if you are debugging a functional problem. The overhead of the DBMS_APPLICATION_INFO calls is trivially small, and they present the opportunity to make performance improvement activities much more effective for both developers and DBAs.

What did Instrumentation ever do for us? 

This chart is taken from EDB360 for yet another Java system that does not have any Instrumentation. Module defaults to the program name. A profile of DB time by module is not particularly useful when nearly everything has the same module.
For contrast, this example is taken from a PeopleSoft system. PeopleSoft is now instrumented out-of-the-box. With instrumentation, you can see that the time is broken down between the various components of the system. Now you can see which components consume the most time and justify the most attention. Many of the module names are meaningful to the developers.

Unfortunately very few developers or package vendors do this. This includes Oracle, although they have instrumented their E-Business Suite and PeopleSoft ERP packages, many others are not.

Conclusion 

Nothing in this story is new.
  • ASH: Oracle 10g, 2003
  • DBMS_APPLICATION_INFO: Oracle 7.3.3, 1997 
  • Wait Interface: Oracle 7, 1992
  • Session SQL Trace: Oracle 6, 1988
I write and talk about it instrumentation because I think it is impossible to overstate the value of instrumentation to an application using DBMS_APPLICATION_INFO and yet it is very rare to find instrumented systems.
I would characterise instrumentation, along with debug code, as a small investment during development that could pay big dividends later, rather than an overhead.  Just as effective debug code can help you debug a functional problem, effective instrumentation can help you analyze performance issues.
The overhead of the DBMS_APPLICATION_INFO calls is trivially small, and they make performance improvement activities much easier for both developers and DBAs.  So really, it doesn't cost you anything, it will save you effort and resource later.

*My thanks to Mogens Nørgaard and to Peter Gram who is the keeper of the Oracle Museum at Miracle in Denmark

Friday, July 15, 2016

Complete Refresh of Materialized Views: Atomic, Non-Atomic and Out-of-Place

Recently, and more than once, I have found myself explaining exactly how an Oracle database performs a complete refresh of a materialized view, and what are the implications.  So I thought I would set it out in a blog post.
I am only looking at the case where it is not possible to incrementally refresh using on a materialized view log because the materialized view joins tables together, groups data, or breaches one of the other restrictions (see General Restrictions on Fast Refresh).

Behaviour of DBMS_MVIEW.REFRESH

Materialized views are managed by the delivered DBMS_MVIEW package.  The REFRESH procedure can refresh one or more materialized views.  I want to discuss at the effect of the ATOMIC_REFRESH parameter.  This parameter has been present since at least Oracle 8i when materialized views were called snapshots, so none of this is new.
DBMS_MVIEW.REFRESH (
   { list                 IN     VARCHAR2,
   | tab                  IN     DBMS_UTILITY.UNCL_ARRAY,}
…
   atomic_refresh         IN     BOOLEAN        := true,
…
);
This is what the Oracle 12c documentation says:
"If this parameter is set to true, then the list of materialized views is refreshed in a single transaction. All of the refreshed materialized views are updated to a single point in time. If the refresh fails for any of the materialized views, none of the materialized views are updated.
If this parameter is set to false, then each of the materialized views is refreshed non-atomically in separate transactions.
As part of complete refresh, if truncate is used (non-atomic refresh), unique index rebuild is executed. INDEX REBUILD automatically computes statistics. Thus, statistics are updated for truncated tables."
It is possible to see the behaviour of the refresh procedure by using SQL trace.  My tests were run on a 12c database.  I have cut a lot out of the traces to make them easier to read.

Trace of Atomic Refresh

  • All constraints are deferred to the end of the transaction which is not until all the specified materialized views are refreshed.
  • For each materialized view, all rows are deleted
  • Then the complete content is reinserted.
  • If more than one materialised view is refreshed in a refresh group, or by specifying them in a single execution of refresh, they are refreshed in a single transaction.
Trace file /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_27100_ATOMIC.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
System name: Linux
Node name: localhost.localdomain
Release: 3.8.13-68.1.3.el7uek.x86_64
Version: #2 SMP Wed Apr 22 11:51:54 PDT 2015
Machine: x86_64
=====================
PARSING IN CURSOR #140647220341720 len=73 dep=0 uid=156 oct=47 lid=156 tim=2204121046481 hv=3058798929 ad='c86e9600' sqlid='1rz3cakv534aj'
BEGIN DBMS_MVIEW.REFRESH('MY_EXAMPLE2','C', atomic_refresh=>TRUE); END;
END OF STMT
PARSE #140647220341720:c=0,e=139,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2204121046475
=====================
PARSING IN CURSOR #140647220211192 len=28 dep=1 uid=156 oct=90 lid=156 tim=2204121059425 hv=3879155062 ad='c88f49d0' sqlid='231qv0rmmfdbq'
SET CONSTRAINTS ALL DEFERRED
END OF STMT
PARSE #140647220211192:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204121059418
EXEC #140647220211192:c=0,e=72,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204121059610
CLOSE #140647220211192:c=0,e=9,dep=1,type=0,tim=2204121059737
=====================
PARSING IN CURSOR #140647218302768 len=34 dep=1 uid=156 oct=7 lid=156 tim=2204121175365 hv=812824424 ad='c86e1658' sqlid='46t32c0s75dv8'
 delete from "SCOTT"."MY_EXAMPLE2"
END OF STMT
PARSE #140647218302768:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2149302637,tim=2204121175360
EXEC #140647218302768:c=0,e=303,p=0,cr=3,cu=2,mis=0,r=1,dep=1,og=1,plh=2149302637,tim=2204121175801
pw=0 time=93 us cost=2 size=0 card=1)'
=====================
PARSING IN CURSOR #140647218302768 len=177 dep=1 uid=156 oct=2 lid=156 tim=2204121185717 hv=1441390243 ad='c86d91c8' sqlid='9ctg639aymrp3'
INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MY_EXAMPLE2" SELECT 2 id
,      sysdate my_system_date
,      count(*) X
from kill_cpu
connect by n > prior n
start with n = 1
END OF STMT
PARSE #140647218302768:c=9000,e=8028,p=0,cr=8,cu=0,mis=1,r=0,dep=1,og=1,plh=2203518424,tim=2204121185711
*** 2016-07-11 14:44:24.152
EXEC #140647218302768:c=11189000,e=11191474,p=0,cr=4,cu=3,mis=0,r=1,dep=1,og=1,plh=2203518424,tim=2204132377380
CLOSE #140647218302768:c=0,e=18,dep=1,type=0,tim=2204132377809

Trace of Non-Atomic Refresh

  • Again, all constraints are deferred to the end of the transaction, which this time is only as far as the truncate statement.
  • An exclusive lock is obtained on the materialized view to ensure that there are no in-flight transactions on it.
  • The materialized view is then truncated.  The purge materialized view log option is specified in case another materialized view is based on this materialized view
  • The materialized view is repopulated, but in direct path mode because the APPEND hint is specified.
  • Stats are automatically collected on a 12c database (note this statement is one level deeper).
Trace file /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_27100_NOTATOMIC.trc
=====================
PARSING IN CURSOR #140647224204240 len=74 dep=0 uid=156 oct=47 lid=156 tim=2204109695504 hv=2544739527 ad='c8c758b8' sqlid='fm9xzbfbuv967'
BEGIN DBMS_MVIEW.REFRESH('MY_EXAMPLE1','C', atomic_refresh=>FALSE); END;
END OF STMT
PARSE #140647224204240:c=0,e=124,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2204109695500
=====================
PARSING IN CURSOR #140647223494264 len=28 dep=1 uid=156 oct=90 lid=156 tim=2204109716722 hv=3879155062 ad='c88f49d0' sqlid='231qv0rmmfdbq'
SET CONSTRAINTS ALL DEFERRED
END OF STMT
PARSE #140647223494264:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204109716718
EXEC #140647223494264:c=0,e=8,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204109716836
CLOSE #140647223494264:c=0,e=15,dep=1,type=0,tim=2204109717294 
=====================
PARSING IN CURSOR #140647218294296 len=59 dep=2 uid=156 oct=26 lid=156 tim=2204109881174 hv=846098398 ad='c8831a50' sqlid='8td4wj8t6wvyy'
LOCK TABLE "SCOTT"."MY_EXAMPLE1" IN EXCLUSIVE MODE  NOWAIT 
END OF STMT
PARSE #140647218294296:c=1000,e=759,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=1,plh=0,tim=2204109881168
EXEC #140647218294296:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=1,plh=0,tim=2204109881317
CLOSE #140647218294296:c=0,e=11,dep=2,type=0,tim=2204109881369
=====================
PARSING IN CURSOR #140647218302768 len=56 dep=1 uid=156 oct=85 lid=156 tim=2204109882394 hv=4078491682 ad='c8838320' sqlid='1hkq5cgtjjq12'
 truncate table "SCOTT"."MY_EXAMPLE1" purge snapshot log
END OF STMT
PARSE #140647218302768:c=3000,e=3255,p=0,cr=6,cu=1,mis=1,r=0,dep=1,og=1,plh=1592878302,tim=2204109882390
=====================
PARSING IN CURSOR #140647218302768 len=185 dep=1 uid=156 oct=2 lid=156 tim=2204109979170 hv=3331603390 ad='c8824c90' sqlid='bag14pr398fxy'
INSERT /*+ BYPASS_RECURSIVE_CHECK APPEND  */ INTO "SCOTT"."MY_EXAMPLE1" SELECT 1 id
,      sysdate my_system_date
,      count(*) X
from kill_cpu
connect by n > prior n
start with n = 1
END OF STMT
PARSE #140647218302768:c=58000,e=58028,p=0,cr=178,cu=0,mis=1,r=0,dep=1,og=1,plh=1420092967,tim=2204109979166
CLOSE #140647224167168:c=1000,e=608,dep=2,type=1,tim=2204120957626
=====================
PARSING IN CURSOR #140647224532056 len=124 dep=2 uid=0 oct=47 lid=0 tim=2204120958103 hv=1731287439 ad='d54ed0e0' sqlid='827tjqdmm2qcg'
BEGIN    dbms_stats.postprocess_stats(:owner, :tabname, :partname,      :subpartname, 2, :rawstats, :selmap, :clist);  END; 
END OF STMT
PARSE #140647224532056:c=0,e=128,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=0,tim=2204120958098

Comparison

While there are lots of advantages to non-atomic refresh, but perhaps the most significant disadvantage for developers is that the data disappears during refresh.  I think it is this that leads to widespread use of atomic refresh, but in the next section I show how this can be overcome.

Atomic Refresh Non-Atomic Refresh
Delete, Insert, Commit Truncate, Insert /*+APPEND*/
Space occupied by deleted rows not available to be reused until the delete is committed, which is not until after the insert. High water marks reset by truncate. All space in table released immediately. Table likely to be about half size of atomically refreshed materialized view.
Significant undo (and therefore also redo on undo) overhead. Minimal undo overhead.
Delete and Insert performed row-by-row. Indexes are maintained during DML. Insert performed in direct path mode. Faster. Indexes rebuilt when insert committed.
Table can be compressed by explicitly rebuilding after refresh, or requires Advanced Compression Licence. Table data can be compressed using basic compression without needing Advanced Compression Licence.
This would also work well with Hybrid Columnar Compression (HCC) on Exadata.
Statistics are not collected. It may be necessary to collect them manually. In 12c, table statistics will automatically be updated.
Content of materialized view always available due to read consistency. Materialized view will appear to be empty while refresh process is running.
Refresh of multiple materialized views can occur in a single transaction so that they remain consistent. Each materialized view refreshed independently.

Non-Atomic Refresh Without Temporarily Losing the Data

In this section I want to explain how to evade the problem of the data disappearing from the materialized view during non-atomic refresh.  Like all the best conjuring tricks, you require two materialized views!  First, I will describe a workaround for use with Oracle 11g or earlier, then I will look at a new feature in Oracle 12c.
I want to create a simple stand-alone demonstration.  I need a materialized view that will take several seconds to refresh so I can check whether the data is still available during that process.  So, I need a long running query, for which I am using Jonathan Lewis's kill_cpu script.  On my database, it runs for around 10 seconds.
set timi on
DROP TABLE kill_cpu PURGE
/
create table kill_cpu(n primary key) organization index as 
select rownum from dual connect by level <=1
/
--this is Jonathan Lewis' killcpu query - duration doubles for every row in kill_cpu table - so the MVs take time to refresh
select count(*) X 
from kill_cpu 
connect by n > prior n 
start with n = 1 
/
I am going to create two identical materialized views based on (almost) the same query.
--create 2 identical MVs.  this time they will be fast to build
DROP MATERIALIZED VIEW my_example1;
DROP MATERIALIZED VIEW my_example2;

CREATE MATERIALIZED VIEW my_example1
COMPRESS PCTFREE 0
BUILD IMMEDIATE
AS 
SELECT 1 id
,      sysdate my_system_date
,      count(*) X 
from kill_cpu 
connect by n > prior n 
start with n = 1 
/

CREATE MATERIALIZED VIEW my_example2
COMPRESS PCTFREE 0
BUILD IMMEDIATE
AS 
SELECT 2 id
,      sysdate my_system_date
,      count(*) X 
from kill_cpu 
connect by n > prior n 
start with n = 1 
/
--this makes the query in the MVs slow to execute
TRUNCATE TABLE kill_cpu;
INSERT INTO kill_cpu
SELECT rownum FROM dual CONNECT BY LEVEL <=24
/
commit
/
I can see my materialized views and when they were last refreshed
alter session set nls_date_Format = 'hh24:mi:ss dd.mm.yyyy';
--list MVs and last refresh dates
column mview_name format a20
select mview_name, last_refresh_date, last_refresh_end_time
from user_mviews
order by last_refresh_date
/

MVIEW_NAME           LAST_REFRESH_DATE   LAST_REFRESH_END_TI
-------------------- ------------------- -------------------
MY_EXAMPLE1          14:44:01 11.07.2016 14:44:12 11.07.2016
MY_EXAMPLE2          14:44:12 11.07.2016 14:44:23 11.07.2016
The following PL/SQL block
  • refreshes the older version of the two materialized views
  • It then recreates a view on the freshly refreshed materialized view.
    • The application should reference this view, and so it will always point to the most up-to-date materialized view.
    • The view will refer to the older materialized until the refresh is complete so the data will never disappear.
    • Complex view merging means that any query will seamlessly reference the underlying table and the performance will be as if the application had referred to the materialized view.
    • In-flight queries will not be affected by the view being rebuilt. They will continue to run against the now older materialized view.
--refresh MVs, collect stats, rebuild view, 
set serveroutput on timi on
DECLARE
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
  l_mview_owner VARCHAR2(128) := 'SCOTT';
  l_mview_name  VARCHAR2(128);
  l_sql CLOB;
BEGIN
  dbms_application_info.read_module(module_name=>l_module, action_name=>l_action);
  dbms_application_info.set_module(module_name=>'MV_EXAMPLE', action_name=>'Initialise');
  WITH x AS (
    SELECT ROW_NUMBER() OVER (ORDER BY last_refresh_date) as ranking
    ,      mview_name, last_refresh_date  
    FROM   user_mviews
    WHERE  owner = l_mview_owner
    AND    mview_name IN('MY_EXAMPLE1','MY_EXAMPLE2')
  ) 
  SELECT mview_name
  INTO   l_mview_name
  FROM   x
  WHERE  ranking = 1;

  dbms_application_info.set_action(action_name=>'Refresh '||l_mview_name);  
  dbms_output.put_line('About to refresh mview '||l_mview_name);
  DBMS_MVIEW.REFRESH(l_mview_name,'C', atomic_refresh=>FALSE);

  --add stats collection here – not necessary in 12c
  --dbms_application_info.set_action(action_name=>'Collect Stats '||l_mview_name);  
  --dbms_stats.gather_Table_stats(ownname=>l_mview_owner, tabname=>l_mview_name, cascade=>TRUE);

  --rebuild view to point to MV just refreshed  
  dbms_application_info.set_action(action_name=>'Rebuild view MY_EXAMPLE');  
  l_sql:= 'CREATE OR REPLACE VIEW my_example AS SELECT * FROM '||l_mview_name;
  dbms_output.put_line('SQL:'||l_sql);
  EXECUTE IMMEDIATE l_sql;

  --reinstate initial settings for module and action
  dbms_application_info.set_module(module_name=>l_module, action_name=>l_action);
END;
/
select mview_name, last_refresh_date, last_refresh_end_time
from user_mviews
order by last_refresh_date
/
The first time I ran the script MY_EXAMPLE1 was refreshed and the view was rebuilt to point to it.
About to refresh mview MY_EXAMPLE1
SQL:CREATE OR REPLACE VIEW my_example AS SELECT * FROM MY_EXAMPLE1

MVIEW_NAME           LAST_REFRESH_DATE   LAST_REFRESH_END_TI
-------------------- ------------------- -------------------
MY_EXAMPLE2          14:44:12 11.07.2016 14:44:23 11.07.2016
MY_EXAMPLE1          10:39:57 12.07.2016 10:40:06 12.07.2016
The next time I ran the script MY_EXAMPLE2 was refreshed and the view was again rebuilt.
About to refresh mview MY_EXAMPLE2
SQL:CREATE OR REPLACE VIEW my_example AS SELECT * FROM MY_EXAMPLE2

MVIEW_NAME           LAST_REFRESH_DATE   LAST_REFRESH_END_TI
-------------------- ------------------- -------------------
MY_EXAMPLE1          10:39:57 12.07.2016 10:40:06 12.07.2016
MY_EXAMPLE2          10:40:50 12.07.2016 10:40:59 12.07.2016

Summary 

There is a trade-off to this approach, but I suggest that the advantages of this approach to using non-atomic refresh outweigh the additional complexities.
Advantages Disadvantanges
Data is always available during materialized view refresh. The refresh and view rebuild process is more complex, though it could be encapsulated into a PL/SQL procedure.
Each materialized view will be half the size of the atomically refresh materialized view because space is freed by the delete is only available after the commit. So the overall space requirement is about the same as atomic refresh. Two copies of the materialized view and its indexes are required.
Refresh will be faster because indexes are not maintained during the insert, but are rebuilt afterwards.
Queries no longer run on view being refreshed, so database does not have to do consistent reads of data blocks. This should prevent query performance being degraded during refresh.
Basic table compression (or HCC on Exadata) could make it considerably smaller. This should also improve query performance by reducing I/O.
I have not reserved any free space in the table in my demonstration because I know the table will never be updated. Further saving space and reducing I/O.

Out-of-Place Refresh in Oracle 12c

Out-of-place refresh is a new option on DBMS_MVIEW.REFRESH in Oracle 12c.  The complete refresh process builds a new table which seamlessly becomes the materialized view, the old table is dropped.  It only works in conjunction with non-atomic refresh.  It also works with incremental (or fast) refresh.
As with my 11g workaround described above, two copies of the data are used. The refresh process creates and populates a new table with indexes, exchanges the new with the old, and drops the old table. For this example, I have enabled the recycle bin so we can see the dropped table.
exec DBMS_MVIEW.REFRESH('MY_EXAMPLE1','C', atomic_refresh=>FALSE, out_of_place=>TRUE);
select object_id, object_type, object_name, last_ddl_time
from user_objects
where object_name LIKE 'MY_EXAMPLE1'
order by 1
/
select original_name, createtime, droptime, dropscn, base_object, purge_object
from user_recyclebin
order by dropscn
/
Each time I refresh the materialized view, the table gets a new object ID, and a new object appears in the recycle bin.
 OBJECT_ID OBJECT_TYPE             OBJECT_NAME     LAST_DDL_TIME
---------- ----------------------- --------------- -------------------
    112366 MATERIALIZED VIEW       MY_EXAMPLE1     09:50:06 14.07.2016
    112382 TABLE                   MY_EXAMPLE1     08:40:34 14.07.2016

ORIGINAL_NAME   CREATETIME          DROPTIME               DROPSCN BASE_OBJECT
--------------- ------------------- ------------------- ---------- -----------
RV$1B6FD        2016-07-14:09:43:43 2016-07-14:09:50:05   68878082      112381
The object ID of the new entry is the previous object ID of the materialized view. A new table was created and the old table was dropped. The intermediate table name is RV$ followed by the object ID in hexadecimal.
 OBJECT_ID OBJECT_TYPE             OBJECT_NAME     LAST_DDL_TIME
---------- ----------------------- --------------- -------------------
    112366 MATERIALIZED VIEW       MY_EXAMPLE1     09:51:41 14.07.2016
    112383 TABLE                   MY_EXAMPLE1     08:40:34 14.07.2016

ORIGINAL_NAME   CREATETIME          DROPTIME               DROPSCN BASE_OBJECT
--------------- ------------------- ------------------- ---------- -----------
RV$1B6FD        2016-07-14:09:43:43 2016-07-14:09:50:05   68878082      112381
RV$1B6FE        2016-07-14:09:50:03 2016-07-14:09:51:40   68878546      112382
A trace of the refresh process shows the data being inserted into another table. However, note that it is a conventional insert, not a direct-path insert. So statistics will not be collected automatically. The DDL statements at create and drop interim tables are truncated in the trace, but can be reinstated with event 10979, level 1 (see Jonathan Lewis's 12c MView refresh blog).  This event also causes additional progress information about the refresh process to be written to the trace file.
PARSING IN CURSOR #139759670421744 len=178 dep=1 uid=156 oct=2 lid=156 tim=2412454493901 hv=3909484904 ad='d56b62a8' sqlid='0k8chjrnhc0b8'
/* MV_REFRESH (ITB) */INSERT INTO "SCOTT"."RV$1B6E0"  SELECT 1 id
,      sysdate my_system_date
,      n
--,      count(*) X
from kill_cpu
connect by n > prior n
start with n = 1
END OF STMT

Summary

Out-of-Place refresh has some advantages and disadvantages over my 11g method.
AdvantagesDisadvantanges
The entire refresh process is encapsulated within the DBMS_MVIEW.REFRESH procedure.
Physical tablespace is still required to hold two copies of the table, but now only during the refresh. The original table is dropped after the new one is built.  If the recycle bin is enabled, every out-of-place refresh will preserve another copy of the table and any indexes. I recommend disabling the recycle bin.
The data is always available while the materialized view is being refreshed. Long running queries running across the out-of-place refresh will continue to run correctly, at least until the data blocks used to hold the dropped table are overwritten, when an "ORA_08103: object no longer exists" error will be raised (again, see Jonathan Lewis's 12c MView refresh blog).
The table is populated in conventional rather than direct path mode. Although the indexes are built after population. This has a number of consequences.
  • The performance of the conventional insert will not be as good as direct-path insert.
  • Simple table compression will not have any effect. HCC is probably not an option.
  • Table statistics are not automatically collected by On-Line Statistics Gathering, but the indexes will have statistics computed as they are rebuilt during each refresh.

Thursday, June 16, 2016

Diagnosing Non-Intuitive Errors with Errorstack

This is a story about how to diagnose a problem where the error message did not seem to make sense, nor have any apparent relation to what I thought was happening.

The Problem 

During a PeopleSoft on Exadata Proof-of-concept test, running on Oracle 11.2.0.4, we got this error in a COBOL process in PeopleSoft North American Payroll.
Application Program Failed
 Action Type     : SQL SELECT
 In Pgm Section  : SQLRT: EXECUTE-STMT
 With Return Code: 31011
 Error Message   : ORA-31011: XML parsing failed ORA-19202: Error occurred in XML processing LPX-00210: expected '<' instead of '�' Error at line 1
 Stored Stmt     : PSPEBUPD_S_BENF_NO
 SQL Statement   : SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A WHERE A.COMP
                   ANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3 AND A.OFF_CYCLE
                   =:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6�
There is no XML processing anywhere near this process. The table only has character, date and numeric columns. No CLOBs or XML data types.  The error didn't make any sense to me at the time.

Initial Attempts at Diagnosis 

So, I generated both PeopleSoft and Oracle session traces for the process. This is from an Oracle session trace – I don’t see any bind variables in the trace for this statement, but I do for other statements in the same session. The ORA-31011 error definitely occurs during the SQL parse.
…
=====================
PARSING IN CURSOR #140203930927312 len=157 dep=0 uid=139 oct=3 lid=139 tim=1466008802922936 hv=3840857025 ad='345a5f148' sqlid='4q8nzfzkfxny1'
SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3 AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
END OF STMT
PARSE #140203930927312:c=0,e=178,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1466008802922936
WAIT #140203930927312: nam='library cache lock' ela= 89 handle address=13956911600 lock address=14075795680 100*mode+namespace=955359640485890 obj#=229711 tim=1466008802923114
WAIT #140203930927312: nam='library cache pin' ela= 82 handle address=13956911600 pin address=14075795424 100*mode+namespace=955359640485890 obj#=229711 tim=1466008802923249
=====================
PARSE ERROR #140203930927312:len=158 dep=0 uid=139 oct=3 lid=139 tim=1466008802924036 err=31011
SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3 AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
…
I found I could reproduce it in SQL*Plus. I tried an explain plan in SQL*Plus so it would just parse the statement
SQL> explain plan for
  2  SELECT A.BENEFIT_RCD_NBR
  3  FROM PS_PAY_EARNINGS A
  4  WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3
  5  AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
  6  /
FROM PS_PAY_EARNINGS A
     *
ERROR at line 3:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of '┐'
Error at line 1
It produced a perfectly innocuous execution plan
Plan hash value: 1063404702
------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |     1 |    79 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| PSAPAY_EARNINGS |     1 |    79 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / A@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$1" "A"@"SEL$1" ("PS_PAY_EARNINGS"."COMPANY"
              "PS_PAY_EARNINGS"."PAYGROUP" "PS_PAY_EARNINGS"."PAY_END_DT"
              "PS_PAY_EARNINGS"."OFF_CYCLE" "PS_PAY_EARNINGS"."PAGE_NUM"
              "PS_PAY_EARNINGS"."LINE_NUM" "PS_PAY_EARNINGS"."SEPCHK"
              "PS_PAY_EARNINGS"."ADDL_NBR" "PS_PAY_EARNINGS"."PAY_LINE_STATUS"
              "PS_PAY_EARNINGS"."OK_TO_PAY" "PS_PAY_EARNINGS"."SINGLE_CHECK_USE"
              "PS_PAY_EARNINGS"."EMPLID" "PS_PAY_EARNINGS"."EMPL_RCD"
              "PS_PAY_EARNINGS"."BENEFIT_RCD_NBR"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.4')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."COMPANY"=:1 AND "A"."PAYGROUP"=:2 AND
              "A"."PAY_END_DT"=:3 AND "A"."OFF_CYCLE"=:4 AND "A"."EMPLID"=:5 AND
              "A"."EMPL_RCD"=TO_NUMBER(:6))
       filter("A"."EMPLID"=:5 AND "A"."EMPL_RCD"=TO_NUMBER(:6))
But then if I used different bind variable names so I could specify variables in SQL*Plus and assign values the statement no longer produced the error.
SQL> explain plan for
  2  SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A
  3  WHERE A.COMPANY=:b1 AND A.PAYGROUP=:b2 AND A.PAY_END_DT=:b3
  4  AND A.OFF_CYCLE=:b4 AND A.EMPLID=:b5 AND A.EMPL_RCD=:b6
  5  /

Explained.
Or, if I tried a different numeric column, again no error.
SQL> explain plan for
  2  SELECT A.SHIFT_RT
  3  FROM PS_PAY_EARNINGS A
  4  WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3
  5  AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
  6  /

Explained.
I wondered if the table was corrupt in some way and tried rebuilding it, but that didn't make any difference.
I couldn't decide if this error means the bind variables haven’t been set up properly in COBOL (I had already worked through other COBOL issues), or whether there really is a problem in the database.
I will also say that I had changed the character set on this database from WE8MSWIN1252 to AL32UTF8.
In fact there is a clue to what was actually going on in what I have already described, but I had got too close to the problem to see it. At this point I asked some colleagues to take a look and two very good suggestions were made. One was to generate an error stack to investigate the error. I will tell you about the other one later.

Error Stack 

I enabled, at session level, an event to generate an error stack on the error. I also added an identifier to the trace file name to make it easier to find the file.
alter session set tracefile_identifier=ERRORSTACK;
alter session set events '31011 trace name errorstack level 3';
I reproduced the error and got this trace file. It has the standard trace header information.
Trace file /u01/app/oracle/diag/rdbms/xxxxxx/XXXXXX1/trace/XXXXXX1_ora_16657_ERRORSTACK.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1
System name: Linux
Node name: xxxxxxxxx.enkitec.local
Release: 2.6.39-400.250.4.el6uek.x86_64
Version: #1 SMP Tue Jun 2 14:50:33 PDT 2015
Machine: x86_64
Instance name: XXXXXX1
Redo thread mounted by this instance: 1
Oracle process number: 57
Unix process pid: 16657, image: oracle@xxxxxxxx.enkitec.local

*** 2016-06-16 03:46:31.677
*** SESSION ID:(56.4425) 2016-06-16 03:46:31.677
*** CLIENT ID:() 2016-06-16 03:46:31.677
*** SERVICE NAME:(XXXXXX) 2016-06-16 03:46:31.677
*** MODULE NAME:(SQL*Plus) 2016-06-16 03:46:31.677
*** ACTION NAME:() 2016-06-16 03:46:31.677
Then we get the error message and problem SQL statement
*** 2016-06-16 03:46:31.677
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of '¯'
Error at line 1
----- Current SQL Statement for this session (sql_id=0vzbgvwh3mr4v) -----
explain plan for
SELECT A.BENEFIT_RCD_NBR
FROM PS_PAY_EARNINGS A
WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3
AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
Next comes the stack trace. The list of C functions in the Oracle kernel that have been called when the error occurs. The stack trace starts with the most deeply nested function and works up towards the main() function.
We have an XML error, so let's see if there is anything in the stack about XML. I have marked them in bold to make it easier.

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
ksedst1()+103        call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
ksedst()+39          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbkedDefDump()+2746  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
ksedmp()+41          call     dbkedDefDump()       000000003 ? 000000000 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbkdaKsdActDriver()  call     ksedmp()             000000003 ? 000000000 ?
+1960                                              7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbgdaExecuteAction(  call     dbkdaKsdActDriver()  7FE43E28E730 ? 7FFFF4C32C80 ?
)+1065                                             7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbgdaRunAction()+81  call     dbgdaExecuteAction(  7FE43E28E730 ? 00A600760 ?
5                             )                    0020C0003 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgdRunActions()+64  call     dbgdaRunAction()     7FE43E28E730 ? 000000005 ?
                                                   0020C0003 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgdProcessEventAct  call     dbgdRunActions()     7FE43E28E730 ? 000000005 ?
ions()+656                                         0020C0003 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgdChkEventKgErr()  call     dbgdProcessEventAct  7FE43E28E730 ? 00C0CC9E0 ?
+1658                         ions()               7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbkdChkEventRdbmsEr  call     dbgdChkEventKgErr()  7FE43E28E730 ? 00C0CC9E0 ?
r()+61                                             7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
ksfpec()+66          call     dbkdChkEventRdbmsEr  7FE43E28E730 ? 00C0CC9E0 ?
                              r()                  7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgePostErrorKGE()+  call     ksfpec()             7FE43E28E730 ? 00C0CC9E0 ?
1137                                               7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   00C0CC9E0 ? 7FE43B3B4920 ?
71                                                 000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
kgeselv()+276        call     dbkePostKGE_kgsf()   00C0CC9E0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
kgesecl0()+139       call     kgeselv()            00C0CC9E0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000000 ?
qmxErrHandler()+860  call     kgesecl0()           00C0CC9E0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 00000000A ?
XmlErrOut()+61       call     qmxErrHandler()      7FFFF4C36DE0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 00000000A ?
LpxErrMsg()+223      call     XmlErrOut()          7FE43A7BFBF8 ? 7FE43B3B4920 ?
                                                   7FFFF4C343E0 ? 7FFFF4C32C80 ?
                                                   000000001 ? 00000000A ?
LpxFSMParseProlog()  call     LpxErrMsg()          7FE43A7BFBF8 ? 7FE43B3B4920 ?
+12252                                             00000003C ? 000E3B1AF ?
                                                   000000000 ? 0000000E3 ?
LpxFSMParseDocument  call     LpxFSMParseProlog()  00287CB91 ? 7FE4219243E0 ?
()+296                                             00000003C ? 000E3B1AF ?
                                                   000000000 ? 0000000E3 ?
LpxParse()+2304      call     LpxFSMParseDocument  7FE43B275BC8 ? 7FE4219243E0 ?
                              ()                   00000003C ? 000E3B1AF ?
                                                   000000000 ? 0000000E3 ?
LpxBufferParse()+49  call     LpxParse()           7FE43B275BC8 ? 000000003 ?
                                                   7FE4219248F0 ? 00000047D ?
                                                   000000000 ? 7FE43B3DFD78 ?
qmxuParseXobDoc_int  call     LpxBufferParse()     7FE43B275BC8 ? 000000003 ?
()+2968                                            7FE4219248F0 ? 00000047D ?
                                                   000000000 ? 7FE43B3DFD78 ?
qmxManifest0()+1871  call     qmxuParseXobDoc_int  00C0CC9E0 ? 2DBE2A518 ?
                              ()                   7FE4219248F0 ? 00000047D ?
                                                   000000000 ? 7FE43B3DFD78 ?
qmxManifest()+95     call     qmxManifest0()       00C0CC9E0 ? 2DBE2A518 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000000 ?
qmxIterInit()+183    call     qmxManifest()        00C0CC9E0 ? 2DBE2A518 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000000 ?
qmxGetNodeByNameCS(  call     qmxIterInit()        00C0CC9E0 ? 7FFFF4C38538 ?
)+191                                              2DBE2A518 ? 00000001E ?
                                                   000000001 ? 000000000 ?
qolTextXmlStringToH  call     qmxGetNodeByNameCS(  00C0CC9E0 ? 2DBE2A518 ?
intList()+140                 )                    2DBE2A518 ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
qsmoGetSqlProfHints  call     qolTextXmlStringToH  2DB891498 ? 7FE4219248F0 ?
()+402                        intList()            00000047D ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
opiSem()+17280       call     qsmoGetSqlProfHints  2DB891498 ? 7FE4219248F0 ?
                              ()                   00000047D ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
opiDeferredSem()+40  call     opiSem()             7FFFF4C3B1A0 ? 345825198 ?
0                                                  0000000AF ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
opitca()+250         call     opiDeferredSem()     7FFFF4C3B1A0 ? 000000000 ?
                                                   0000000AF ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
kksFullTypeCheck()+  call     opitca()             7FE43AF06BD8 ? 2DB8915E8 ?
69                                                 7FFFF4C3D7C0 ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
rpiswu2()+1776       call     kksFullTypeCheck()   7FFFF4C3C8C0 ? 2DB8915E8 ?
                                                   7FFFF4C3D7C0 ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
kksSetBindType()+21  call     rpiswu2()            34C302480 ? 00000008B ?
92                                                 302E454AC ? 000000025 ?
                                                   000000000 ? 00C0CC9E0 ?
kksfbc()+11135       call     kksSetBindType()     7FFFF4C3D7C0 ? 7FE43AF06BD8 ?
                                                   7FFFF4C3D850 ? 000000102 ?
                                                   000000014 ? 000000000 ?
opiexe()+2330        call     kksfbc()             7FE43AF06BD8 ? 000000003 ?
                                                   000000102 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kpoal8()+2118        call     opiexe()             000000049 ? 000000003 ?
                                                   000000102 ? 000000000 ?
                                                   000000000 ? 000000000 ?
opiodr()+917         call     kpoal8()             00000005E ? 000000003 ?
                                                   000000102 ? 000000000 ?
                                                   000000000 ? 000000000 ?
ttcpip()+2183        call     opiodr()             00000005E ? 00000001C ?
                                                   7FFFF4C41FD0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
opitsk()+1710        call     ttcpip()             00C0EA630 ? 00999BB10 ?
                                                   7FFFF4C41FD0 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opiino()+969         call     opitsk()             00C0EA638 ? 000000000 ?
                                                   7FFFF4C41FD0 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opiodr()+917         call     opiino()             00000003C ? 000000004 ?
                                                   7FFFF4C437C8 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opidrv()+570         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFFF4C437C8 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
sou2o()+103          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFFF4C437C8 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opimai_real()+133    call     sou2o()              7FFFF4C437A0 ? 00000003C ?
                                                   000000004 ? 7FFFF4C437C8 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
ssthrdmain()+265     call     opimai_real()        000000002 ? 7FFFF4C43990 ?
                                                   000000004 ? 7FFFF4C437C8 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
main()+201           call     ssthrdmain()         000000002 ? 7FFFF4C43990 ?
                                                   000000001 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
__libc_start_main()  call     main()               000000002 ? 7FFFF4C43B38 ?
+253                                               000000001 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
_start()+41          call     __libc_start_main()  000A29134 ? 000000002 ?
                                                   7FFFF4C43B28 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
There is a call to a function called qsmoGetSqlProfHints(). So do we have SQL Profiles?
The other suggestion made by independently by two colleagues was to look at whether there was a SQL Profile in place. A search on Oracle Support produced the following:
In short, the character set conversion has corrupted the SQL Profiles, and that is causing the error.

Fixing the Profiles 

I checked and we did have profiles. So I moved them to a different category.
BEGIN
 FOR i IN (select * from dba_sql_profiles WHERE category = 'DEFAULT') LOOP
  dbms_sqltune.ALTER_SQL_PROFILE(name=>i.name, attribute_name => 'CATEGORY', value=>'CORRUPT');
 END LOOP;
END;
/
The error ceased, but I could reproduce it if I set
ALTER SESSION SET SQLTUNE_CATEGORY = 'CORRUPT';
So, there is no question but that the profiles have been corrupted in some way.  When I changed the SQL statement in the explain command, it did not error because the profile did not match.
So, I disabled the profiles to make certain they can't be used.
BEGIN
 FOR i IN (select * from dba_sql_profiles WHERE category = 'CORRUPT') LOOP
  dbms_sqltune.ALTER_SQL_PROFILE(name=>i.name, attribute_name => 'STATUS', value=>'DISABLED');
 END LOOP;
END;
/
However, we need the profiles from the original system to maintain the performance assured by the profiles, so I renamed the corrupt profiles and migrated the correct profiles per Doc ID 457531.1: How to Move SQL Profiles from One Database to Another (Including to Higher Versions).
Now, when I produce an execution plan for the problem statement I can see there is a SQL*Profile referenced in the note at the bottom of the execution plan
Plan hash value: 1063404702
------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |     1 |    79 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| PSAPAY_EARNINGS |     1 |    79 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / A@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$1" "A"@"SEL$1" ("PS_PAY_EARNINGS"."COMPANY"
              "PS_PAY_EARNINGS"."PAYGROUP" "PS_PAY_EARNINGS"."PAY_END_DT"
              "PS_PAY_EARNINGS"."OFF_CYCLE" "PS_PAY_EARNINGS"."PAGE_NUM"
              "PS_PAY_EARNINGS"."LINE_NUM" "PS_PAY_EARNINGS"."SEPCHK"
              "PS_PAY_EARNINGS"."ADDL_NBR" "PS_PAY_EARNINGS"."PAY_LINE_STATUS"
              "PS_PAY_EARNINGS"."OK_TO_PAY" "PS_PAY_EARNINGS"."SINGLE_CHECK_USE"
              "PS_PAY_EARNINGS"."EMPLID" "PS_PAY_EARNINGS"."EMPL_RCD"
              "PS_PAY_EARNINGS"."BENEFIT_RCD_NBR"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.4')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."COMPANY"=:1 AND "A"."PAYGROUP"=:2 AND
              "A"."PAY_END_DT"=:3 AND "A"."OFF_CYCLE"=:4 AND "A"."EMPLID"=:5 AND
              "A"."EMPL_RCD"=TO_NUMBER(:6))
       filter("A"."EMPLID"=:5 AND "A"."EMPL_RCD"=TO_NUMBER(:6))

Note
-----
   - SQL profile "SYS_SQLPROF_014b572bf5550000" used for this statement

Footnote

If I flushed the shared pool and reproduced the error with session trace running, then the trace file did show some recursive SQL associated with reading SQL Profiles from the data dictionary, but this SQL was parsed and cached when the error occurred in a COBOL process emitting trace.

Conclusions

  • Don't suffer in silence. Describe your problem properly to your colleagues/a list server/Oracle support as appropriate. Often, just describing it to someone else can help you to see it more clearly. 
  • The error stack may shed light where other techniques leave you stumped. Session trace, so often the rightly instinctive for choice for such analyses, did not reveal anything in this case. My thanks to Frits Hoogland for suggesting this approach. 
  • This was not the first problem we experienced in this database due to changing from a single-byte character set to Unicode (AL32UTF8). Perhaps we should have gone back and recreated the database as a Unicode database and reimported the data.