Thursday, June 16, 2016

Diagnosing Non-Intuitive Errors with Errorstack

This is a story about how to diagnose a problem where the error message did not seem to make sense, nor have any apparent relation to what I thought was happening.

The Problem 

During a PeopleSoft on Exadata Proof-of-concept test, running on Oracle 11.2.0.4, we got this error in a COBOL process in PeopleSoft North American Payroll.
Application Program Failed
 Action Type     : SQL SELECT
 In Pgm Section  : SQLRT: EXECUTE-STMT
 With Return Code: 31011
 Error Message   : ORA-31011: XML parsing failed ORA-19202: Error occurred in XML processing LPX-00210: expected '<' instead of '�' Error at line 1
 Stored Stmt     : PSPEBUPD_S_BENF_NO
 SQL Statement   : SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A WHERE A.COMP
                   ANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3 AND A.OFF_CYCLE
                   =:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6�
There is no XML processing anywhere near this process. The table only has character, date and numeric columns. No CLOBs or XML data types.  The error didn't make any sense to me at the time.

Initial Attempts at Diagnosis 

So, I generated both PeopleSoft and Oracle session traces for the process. This is from an Oracle session trace – I don’t see any bind variables in the trace for this statement, but I do for other statements in the same session. The ORA-31011 error definitely occurs during the SQL parse.
…
=====================
PARSING IN CURSOR #140203930927312 len=157 dep=0 uid=139 oct=3 lid=139 tim=1466008802922936 hv=3840857025 ad='345a5f148' sqlid='4q8nzfzkfxny1'
SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3 AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
END OF STMT
PARSE #140203930927312:c=0,e=178,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1466008802922936
WAIT #140203930927312: nam='library cache lock' ela= 89 handle address=13956911600 lock address=14075795680 100*mode+namespace=955359640485890 obj#=229711 tim=1466008802923114
WAIT #140203930927312: nam='library cache pin' ela= 82 handle address=13956911600 pin address=14075795424 100*mode+namespace=955359640485890 obj#=229711 tim=1466008802923249
=====================
PARSE ERROR #140203930927312:len=158 dep=0 uid=139 oct=3 lid=139 tim=1466008802924036 err=31011
SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3 AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
…
I found I could reproduce it in SQL*Plus. I tried an explain plan in SQL*Plus so it would just parse the statement
SQL> explain plan for
  2  SELECT A.BENEFIT_RCD_NBR
  3  FROM PS_PAY_EARNINGS A
  4  WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3
  5  AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
  6  /
FROM PS_PAY_EARNINGS A
     *
ERROR at line 3:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of '┐'
Error at line 1
It produced a perfectly innocuous execution plan
Plan hash value: 1063404702
------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |     1 |    79 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| PSAPAY_EARNINGS |     1 |    79 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / A@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$1" "A"@"SEL$1" ("PS_PAY_EARNINGS"."COMPANY"
              "PS_PAY_EARNINGS"."PAYGROUP" "PS_PAY_EARNINGS"."PAY_END_DT"
              "PS_PAY_EARNINGS"."OFF_CYCLE" "PS_PAY_EARNINGS"."PAGE_NUM"
              "PS_PAY_EARNINGS"."LINE_NUM" "PS_PAY_EARNINGS"."SEPCHK"
              "PS_PAY_EARNINGS"."ADDL_NBR" "PS_PAY_EARNINGS"."PAY_LINE_STATUS"
              "PS_PAY_EARNINGS"."OK_TO_PAY" "PS_PAY_EARNINGS"."SINGLE_CHECK_USE"
              "PS_PAY_EARNINGS"."EMPLID" "PS_PAY_EARNINGS"."EMPL_RCD"
              "PS_PAY_EARNINGS"."BENEFIT_RCD_NBR"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.4')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."COMPANY"=:1 AND "A"."PAYGROUP"=:2 AND
              "A"."PAY_END_DT"=:3 AND "A"."OFF_CYCLE"=:4 AND "A"."EMPLID"=:5 AND
              "A"."EMPL_RCD"=TO_NUMBER(:6))
       filter("A"."EMPLID"=:5 AND "A"."EMPL_RCD"=TO_NUMBER(:6))
But then if I used different bind variable names so I could specify variables in SQL*Plus and assign values the statement no longer produced the error.
SQL> explain plan for
  2  SELECT A.BENEFIT_RCD_NBR FROM PS_PAY_EARNINGS A
  3  WHERE A.COMPANY=:b1 AND A.PAYGROUP=:b2 AND A.PAY_END_DT=:b3
  4  AND A.OFF_CYCLE=:b4 AND A.EMPLID=:b5 AND A.EMPL_RCD=:b6
  5  /

Explained.
Or, if I tried a different numeric column, again no error.
SQL> explain plan for
  2  SELECT A.SHIFT_RT
  3  FROM PS_PAY_EARNINGS A
  4  WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3
  5  AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
  6  /

Explained.
I wondered if the table was corrupt in some way and tried rebuilding it, but that didn't make any difference.
I couldn't decide if this error means the bind variables haven’t been set up properly in COBOL (I had already worked through other COBOL issues), or whether there really is a problem in the database.
I will also say that I had changed the character set on this database from WE8MSWIN1252 to AL32UTF8.
In fact there is a clue to what was actually going on in what I have already described, but I had got too close to the problem to see it. At this point I asked some colleagues to take a look and two very good suggestions were made. One was to generate an error stack to investigate the error. I will tell you about the other one later.

Error Stack 

I enabled, at session level, an event to generate an error stack on the error. I also added an identifier to the trace file name to make it easier to find the file.
alter session set tracefile_identifier=ERRORSTACK;
alter session set events '31011 trace name errorstack level 3';
I reproduced the error and got this trace file. It has the standard trace header information.
Trace file /u01/app/oracle/diag/rdbms/xxxxxx/XXXXXX1/trace/XXXXXX1_ora_16657_ERRORSTACK.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1
System name: Linux
Node name: xxxxxxxxx.enkitec.local
Release: 2.6.39-400.250.4.el6uek.x86_64
Version: #1 SMP Tue Jun 2 14:50:33 PDT 2015
Machine: x86_64
Instance name: XXXXXX1
Redo thread mounted by this instance: 1
Oracle process number: 57
Unix process pid: 16657, image: oracle@xxxxxxxx.enkitec.local

*** 2016-06-16 03:46:31.677
*** SESSION ID:(56.4425) 2016-06-16 03:46:31.677
*** CLIENT ID:() 2016-06-16 03:46:31.677
*** SERVICE NAME:(XXXXXX) 2016-06-16 03:46:31.677
*** MODULE NAME:(SQL*Plus) 2016-06-16 03:46:31.677
*** ACTION NAME:() 2016-06-16 03:46:31.677
Then we get the error message and problem SQL statement
*** 2016-06-16 03:46:31.677
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of '¯'
Error at line 1
----- Current SQL Statement for this session (sql_id=0vzbgvwh3mr4v) -----
explain plan for
SELECT A.BENEFIT_RCD_NBR
FROM PS_PAY_EARNINGS A
WHERE A.COMPANY=:1 AND A.PAYGROUP=:2 AND A.PAY_END_DT=:3
AND A.OFF_CYCLE=:4 AND A.EMPLID=:5 AND A.EMPL_RCD=:6
Next comes the stack trace. The list of C functions in the Oracle kernel that have been called when the error occurs. The stack trace starts with the most deeply nested function and works up towards the main() function.
We have an XML error, so let's see if there is anything in the stack about XML. I have marked them in bold to make it easier.

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
ksedst1()+103        call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
ksedst()+39          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbkedDefDump()+2746  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
ksedmp()+41          call     dbkedDefDump()       000000003 ? 000000000 ?
                                                   7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbkdaKsdActDriver()  call     ksedmp()             000000003 ? 000000000 ?
+1960                                              7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbgdaExecuteAction(  call     dbkdaKsdActDriver()  7FE43E28E730 ? 7FFFF4C32C80 ?
)+1065                                             7FFFF4C2BA00 ? 7FFFF4C2BAD8 ?
                                                   7FFFF4C30580 ? 000000002 ?
dbgdaRunAction()+81  call     dbgdaExecuteAction(  7FE43E28E730 ? 00A600760 ?
5                             )                    0020C0003 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgdRunActions()+64  call     dbgdaRunAction()     7FE43E28E730 ? 000000005 ?
                                                   0020C0003 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgdProcessEventAct  call     dbgdRunActions()     7FE43E28E730 ? 000000005 ?
ions()+656                                         0020C0003 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgdChkEventKgErr()  call     dbgdProcessEventAct  7FE43E28E730 ? 00C0CC9E0 ?
+1658                         ions()               7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbkdChkEventRdbmsEr  call     dbgdChkEventKgErr()  7FE43E28E730 ? 00C0CC9E0 ?
r()+61                                             7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
ksfpec()+66          call     dbkdChkEventRdbmsEr  7FE43E28E730 ? 00C0CC9E0 ?
                              r()                  7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbgePostErrorKGE()+  call     ksfpec()             7FE43E28E730 ? 00C0CC9E0 ?
1137                                               7FE43A453998 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   00C0CC9E0 ? 7FE43B3B4920 ?
71                                                 000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
kgeselv()+276        call     dbkePostKGE_kgsf()   00C0CC9E0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000002 ?
kgesecl0()+139       call     kgeselv()            00C0CC9E0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 000000000 ?
qmxErrHandler()+860  call     kgesecl0()           00C0CC9E0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 00000000A ?
XmlErrOut()+61       call     qmxErrHandler()      7FFFF4C36DE0 ? 7FE43B3B4920 ?
                                                   000007923 ? 7FFFF4C32C80 ?
                                                   000000001 ? 00000000A ?
LpxErrMsg()+223      call     XmlErrOut()          7FE43A7BFBF8 ? 7FE43B3B4920 ?
                                                   7FFFF4C343E0 ? 7FFFF4C32C80 ?
                                                   000000001 ? 00000000A ?
LpxFSMParseProlog()  call     LpxErrMsg()          7FE43A7BFBF8 ? 7FE43B3B4920 ?
+12252                                             00000003C ? 000E3B1AF ?
                                                   000000000 ? 0000000E3 ?
LpxFSMParseDocument  call     LpxFSMParseProlog()  00287CB91 ? 7FE4219243E0 ?
()+296                                             00000003C ? 000E3B1AF ?
                                                   000000000 ? 0000000E3 ?
LpxParse()+2304      call     LpxFSMParseDocument  7FE43B275BC8 ? 7FE4219243E0 ?
                              ()                   00000003C ? 000E3B1AF ?
                                                   000000000 ? 0000000E3 ?
LpxBufferParse()+49  call     LpxParse()           7FE43B275BC8 ? 000000003 ?
                                                   7FE4219248F0 ? 00000047D ?
                                                   000000000 ? 7FE43B3DFD78 ?
qmxuParseXobDoc_int  call     LpxBufferParse()     7FE43B275BC8 ? 000000003 ?
()+2968                                            7FE4219248F0 ? 00000047D ?
                                                   000000000 ? 7FE43B3DFD78 ?
qmxManifest0()+1871  call     qmxuParseXobDoc_int  00C0CC9E0 ? 2DBE2A518 ?
                              ()                   7FE4219248F0 ? 00000047D ?
                                                   000000000 ? 7FE43B3DFD78 ?
qmxManifest()+95     call     qmxManifest0()       00C0CC9E0 ? 2DBE2A518 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000000 ?
qmxIterInit()+183    call     qmxManifest()        00C0CC9E0 ? 2DBE2A518 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000000 ?
qmxGetNodeByNameCS(  call     qmxIterInit()        00C0CC9E0 ? 7FFFF4C38538 ?
)+191                                              2DBE2A518 ? 00000001E ?
                                                   000000001 ? 000000000 ?
qolTextXmlStringToH  call     qmxGetNodeByNameCS(  00C0CC9E0 ? 2DBE2A518 ?
intList()+140                 )                    2DBE2A518 ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
qsmoGetSqlProfHints  call     qolTextXmlStringToH  2DB891498 ? 7FE4219248F0 ?
()+402                        intList()            00000047D ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
opiSem()+17280       call     qsmoGetSqlProfHints  2DB891498 ? 7FE4219248F0 ?
                              ()                   00000047D ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
opiDeferredSem()+40  call     opiSem()             7FFFF4C3B1A0 ? 345825198 ?
0                                                  0000000AF ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
opitca()+250         call     opiDeferredSem()     7FFFF4C3B1A0 ? 000000000 ?
                                                   0000000AF ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
kksFullTypeCheck()+  call     opitca()             7FE43AF06BD8 ? 2DB8915E8 ?
69                                                 7FFFF4C3D7C0 ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
rpiswu2()+1776       call     kksFullTypeCheck()   7FFFF4C3C8C0 ? 2DB8915E8 ?
                                                   7FFFF4C3D7C0 ? 00000001E ?
                                                   000000000 ? 00C0CC9E0 ?
kksSetBindType()+21  call     rpiswu2()            34C302480 ? 00000008B ?
92                                                 302E454AC ? 000000025 ?
                                                   000000000 ? 00C0CC9E0 ?
kksfbc()+11135       call     kksSetBindType()     7FFFF4C3D7C0 ? 7FE43AF06BD8 ?
                                                   7FFFF4C3D850 ? 000000102 ?
                                                   000000014 ? 000000000 ?
opiexe()+2330        call     kksfbc()             7FE43AF06BD8 ? 000000003 ?
                                                   000000102 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kpoal8()+2118        call     opiexe()             000000049 ? 000000003 ?
                                                   000000102 ? 000000000 ?
                                                   000000000 ? 000000000 ?
opiodr()+917         call     kpoal8()             00000005E ? 000000003 ?
                                                   000000102 ? 000000000 ?
                                                   000000000 ? 000000000 ?
ttcpip()+2183        call     opiodr()             00000005E ? 00000001C ?
                                                   7FFFF4C41FD0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
opitsk()+1710        call     ttcpip()             00C0EA630 ? 00999BB10 ?
                                                   7FFFF4C41FD0 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opiino()+969         call     opitsk()             00C0EA638 ? 000000000 ?
                                                   7FFFF4C41FD0 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opiodr()+917         call     opiino()             00000003C ? 000000004 ?
                                                   7FFFF4C437C8 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opidrv()+570         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFFF4C437C8 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
sou2o()+103          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFFF4C437C8 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
opimai_real()+133    call     sou2o()              7FFFF4C437A0 ? 00000003C ?
                                                   000000004 ? 7FFFF4C437C8 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
ssthrdmain()+265     call     opimai_real()        000000002 ? 7FFFF4C43990 ?
                                                   000000004 ? 7FFFF4C437C8 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
main()+201           call     ssthrdmain()         000000002 ? 7FFFF4C43990 ?
                                                   000000001 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
__libc_start_main()  call     main()               000000002 ? 7FFFF4C43B38 ?
+253                                               000000001 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
_start()+41          call     __libc_start_main()  000A29134 ? 000000002 ?
                                                   7FFFF4C43B28 ? 000000000 ?
                                                   7FFFF4C41A28 ? 7FFFF4C41FCC ?
There is a call to a function called qsmoGetSqlProfHints(). So do we have SQL Profiles?
The other suggestion made by independently by two colleagues was to look at whether there was a SQL Profile in place. A search on Oracle Support produced the following:
In short, the character set conversion has corrupted the SQL Profiles, and that is causing the error.

Fixing the Profiles 

I checked and we did have profiles. So I moved them to a different category.
BEGIN
 FOR i IN (select * from dba_sql_profiles WHERE category = 'DEFAULT') LOOP
  dbms_sqltune.ALTER_SQL_PROFILE(name=>i.name, attribute_name => 'CATEGORY', value=>'CORRUPT');
 END LOOP;
END;
/
The error ceased, but I could reproduce it if I set
ALTER SESSION SET SQLTUNE_CATEGORY = 'CORRUPT';
So, there is no question but that the profiles have been corrupted in some way.  When I changed the SQL statement in the explain command, it did not error because the profile did not match.
So, I disabled the profiles to make certain they can't be used.
BEGIN
 FOR i IN (select * from dba_sql_profiles WHERE category = 'CORRUPT') LOOP
  dbms_sqltune.ALTER_SQL_PROFILE(name=>i.name, attribute_name => 'STATUS', value=>'DISABLED');
 END LOOP;
END;
/
However, we need the profiles from the original system to maintain the performance assured by the profiles, so I renamed the corrupt profiles and migrated the correct profiles per Doc ID 457531.1: How to Move SQL Profiles from One Database to Another (Including to Higher Versions).
Now, when I produce an execution plan for the problem statement I can see there is a SQL*Profile referenced in the note at the bottom of the execution plan
Plan hash value: 1063404702
------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |     1 |    79 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| PSAPAY_EARNINGS |     1 |    79 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / A@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$1" "A"@"SEL$1" ("PS_PAY_EARNINGS"."COMPANY"
              "PS_PAY_EARNINGS"."PAYGROUP" "PS_PAY_EARNINGS"."PAY_END_DT"
              "PS_PAY_EARNINGS"."OFF_CYCLE" "PS_PAY_EARNINGS"."PAGE_NUM"
              "PS_PAY_EARNINGS"."LINE_NUM" "PS_PAY_EARNINGS"."SEPCHK"
              "PS_PAY_EARNINGS"."ADDL_NBR" "PS_PAY_EARNINGS"."PAY_LINE_STATUS"
              "PS_PAY_EARNINGS"."OK_TO_PAY" "PS_PAY_EARNINGS"."SINGLE_CHECK_USE"
              "PS_PAY_EARNINGS"."EMPLID" "PS_PAY_EARNINGS"."EMPL_RCD"
              "PS_PAY_EARNINGS"."BENEFIT_RCD_NBR"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.4')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."COMPANY"=:1 AND "A"."PAYGROUP"=:2 AND
              "A"."PAY_END_DT"=:3 AND "A"."OFF_CYCLE"=:4 AND "A"."EMPLID"=:5 AND
              "A"."EMPL_RCD"=TO_NUMBER(:6))
       filter("A"."EMPLID"=:5 AND "A"."EMPL_RCD"=TO_NUMBER(:6))

Note
-----
   - SQL profile "SYS_SQLPROF_014b572bf5550000" used for this statement

Footnote

If I flushed the shared pool and reproduced the error with session trace running, then the trace file did show some recursive SQL associated with reading SQL Profiles from the data dictionary, but this SQL was parsed and cached when the error occurred in a COBOL process emitting trace.

Conclusions

  • Don't suffer in silence. Describe your problem properly to your colleagues/a list server/Oracle support as appropriate. Often, just describing it to someone else can help you to see it more clearly. 
  • The error stack may shed light where other techniques leave you stumped. Session trace, so often the rightly instinctive for choice for such analyses, did not reveal anything in this case. My thanks to Frits Hoogland for suggesting this approach. 
  • This was not the first problem we experienced in this database due to changing from a single-byte character set to Unicode (AL32UTF8). Perhaps we should have gone back and recreated the database as a Unicode database and reimported the data.