Tuesday, August 21, 2018

Parallel Execution of PL/SQL

A recent experience with text searching showed up some limitations on parallel execution of PL/SQL functions, and how to work around them.

Text Searching can be CPU Intensive?

It started with a requirement to find names that roughly matched a search string. The roughness of the match was determined by calculating the Levenshtein distance between each customer name and the search string. Informally, the Levenshtein distance is a mathematical function that calculates the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Oracle has implemented it as a function in a delivered package UTL_MATCH.EDIT_DISTANCE().  This function isn't directly relevant to the rest of the discussion about parallelism, except that it is an example of CPU intensive processing that doesn't alter the database where parallel execution may be a legitimate tactic for improving performance.
The examples in this article use the SH schema in the Oracle sample schemas (available on Github).

The following query searches case-insensitively for customer names within a Levenshtein distance of 3 from 'Stephen'. It finds 'Steven' and 'Staphany'.
set autotrace on timi on pages 99 lines 200
with x as (
select c.cust_first_name
,      utl_match.edit_distance(upper(cust_first_name),'STEPHEN') edit_distance
from customers c
)
select * from x
where edit_distance <= 3
/

CUST_FIRST_NAME      EDIT_DISTANCE
-------------------- -------------
Staphany                         3
Steven                           2
Steven                           2
However, to do so Oracle had to full scan the table and execute the function for every row in the table.
Plan hash value: 2008213504
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  2775 | 19425 |   429   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |  2775 | 19425 |   429   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("UTL_MATCH"."EDIT_DISTANCE"(UPPER("CUST_FIRST_NAME"),'STEPHEN')<=3)
Oracle implemented Levenshtein as a C function that is then called from PL/SQL, so it just consumes CPU and doesn't do anything else to the database. You can start to see why the user complained that the query with UTL_MATCH was slow.
However, the first question is how much time do we spend on the full scan and how much time do we spend executing this function?

Follow the Time with Instrumentation 

For test purposes, I am going to build my own packaged functions with session instrumentation. Then I can use Active Session History (ASH) to work out where the time went.
NB: ASH requires a licence for the Diagnostics Pack
  • One function levenshtein() calls to UTL_MATCH.EDIT_DISTANCE().
  • The other dolittle() is a control function that does nothing except instrumentation. It is used to measure the intrusion effect of the instrumentation.
CREATE OR REPLACE package dmk AS
function levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
function dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
END;
/

CREATE OR REPLACE package body dmk AS
function levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER IS
  l_distance INTEGER;
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
BEGIN
  dbms_application_info.read_module(l_module,l_action); /*save current module/action*/
  dbms_application_info.set_action('levenshtein()'); /*set action for function*/
  l_distance := utl_match.edit_distance(UPPER(p1),UPPER(p2));
  dbms_application_info.set_action(l_action); /*restore previous action*/
  RETURN l_distance;
END levenshtein;

function dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER IS
  l_distance INTEGER;
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
BEGIN
  dbms_application_info.read_module(l_module,l_action);
  dbms_application_info.set_action('dolittle()');
  l_distance := 1;
  dbms_application_info.set_action(l_action);
  RETURN l_distance;
END dolittle;
END dmk;
/
Now, I will run a test query that executes UTL_MATCH for each of the 55500 rows in the CUSTOMERS table and executes that all 10 times inside a PL/SQL block.
set timi on autotrace on lines 500 serveroutput on
DECLARE
  l_counter INTEGER := 0;
BEGIN
  dbms_application_info.set_module('DMK LEVENSHTEIN TEST',null);
  FOR j IN 1..10 LOOP
    FOR i IN (
      select sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
      ,      sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
      from   customers c
    ) LOOP
      l_counter := l_counter+i.b;
    END LOOP;
  END LOOP;
  dbms_application_info.set_module(null,null);
  dbms_output.put_line('Executions: '||l_counter);
END;
/
Executions: 555000

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.08
We can see that the Levenshtein function was executed 555000 times.  Now I will query the ASH for this test and group it by ACTION.
set timi on autotrace off lines 500
column module format a25
column action format a25
select module, action, sql_id, sum(1) ash_Secs
from v$active_Session_History
where module = 'DMK LEVENSHTEIN TEST'
and sample_time >= SYSDATE-1/1440
group by module, action, sql_id
/

MODULE                    ACTION                    SQL_ID          ASH_SECS
------------------------- ------------------------- ------------- ----------
DMK LEVENSHTEIN TEST      dolittle()                7gp0w6qdvxrd2          2
DMK LEVENSHTEIN TEST      levenshtein()             7gp0w6qdvxrd2          3
DMK LEVENSHTEIN TEST                                7gp0w6qdvxrd2         20
The runtime of the Levenshtein function took 3 seconds, and the function that does nothing except instrumentation is 2 seconds, so the overhead of UTL_MATCH is only about 1 second, and there are 20 seconds in the SQL. In this test, the overhead of Levenshtein is low, but it would still be worth doing the full scan in parallel.

I Can't Get No Parallelism 

But there is a problem! I specified parallelism with a hint, but I don't get a parallel plan.
select /*+PARALLEL(C 4)*/ avg(utl_match.edit_distance(UPPER(cust_first_name),'STEPHEN'))
from customers c
/

Plan hash value: 1978308596
---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |     1 |     7 |   115   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE     |           |     1 |     7 |            |          |
|   2 |   SORT AGGREGATE    |           |     1 |     7 |            |          |
|   3 |    TABLE ACCESS FULL| CUSTOMERS | 55500 |   379K|   115   (0)| 00:00:01 |
---------------------------------------------------------------------------------
However, if I use a simple SQL function, then I do get parallelism.
select /*+PARALLEL(C 4)*/ max(cust_first_name)
from customers c
/

Plan hash value: 1221513835
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |     1 |     7 |   115   (0)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE        |           |     1 |     7 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |           |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000  |     1 |     7 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |           |     1 |     7 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |           | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| CUSTOMERS | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 4 because of table property
So, UTL_MATCH.EDIT_FUNCTION must be preventing parallelism in some way. There is a closed Oracle bug 169587070 that details exactly this problem with UTL_MATCH.

PARALLEL_ENABLE and PRAGMA RESTRICT_REFERENCES 

This is also documented default behaviour in PL/SQL. Database VLDB and Partitioning Guide: About Parallel Execution of Functions, Functions in Parallel Queries:
"A user-written function may be executed in parallel in any of the following cases: 
  • If it has been declared with the PARALLEL_ENABLE keyword
  • If it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES clause that indicates all of WNDS, RNPS, and WNPS 
  • If it is declared with CREATE FUNCTION and the system can analyze the body of the PL/SQL code and determine that the code neither writes to the database nor reads or modifies package variables" 
Oracle only wraps the bodies of their delivered package, but not the package specifications. So, you can see for yourself that the delivered UTL_MATCH package does not contain these specifications.

Workarounds 

There are various workarounds for this.
  • I could add the PARALLEL_ENABLE declarations to the functions in the package specification of the delivered UTL_MATCH package. Although it does work, I would certainly not be happy to alter any delivered Oracle package in any serious database, without approval from Oracle support.
  • Or, I could add the RESTRICT_REFERENCES pragma to the package specification. Again, although this works, it involves altering a delivered package.
  • However, I can wrap the delivered package in my own packaged function with either PARALLEL_ENABLE (my personal preferrance).
CREATE OR REPLACE package dmk AS
FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE;
FUNCTION dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE;
END;
/
  • Or you can use RESTRICT_REFERENCES in the package specification, but you must include the TRUST pragma to over-ride the lack of a pragma definition in the called packaged function.
CREATE OR REPLACE package dmk AS
FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
PRAGMA RESTRICT_REFERENCES (levenshtein,WNDS,RNDS,WNPS,RNPS,TRUST);
FUNCTION dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
PRAGMA RESTRICT_REFERENCES (do_little,WNDS,RNDS,WNPS,RNPS,TRUST);
END;
/
Now, I get parallel execution of the unaltered delivered UTL_MATCH.EDIT_DISTANCE() function.
select /*+PARALLEL(C 4) FULL(C)*/ sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
,      sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
from   customers c
/

Plan hash value: 1221513835
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |     1 |     7 |   115   (0)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE        |           |     1 |     7 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |           |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000  |     1 |     7 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |           |     1 |     7 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |           | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| CUSTOMERS | 55500 |   379K|   115   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 4 because of table property

Conclusion: Parallel PL/SQL 

I can now repeat the earlier test, but with a parallel hint, the runtime goes down from 25 to 9 seconds, although about the same amount of database time is recorded by ASH. So parallelism can improve the performance for the end user, but will not reduce the total CPU overhead. If anything is likely to increase overall CPU time.
set timi on lines 500
DECLARE
  l_counter INTEGER := 0;
BEGIN
  dbms_application_info.set_module('DMK LEVENSHTEIN TEST',null);
  FOR j IN 1..10 LOOP
    FOR i IN (
      select /*+PARALLEL(C 4)*/
             sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
      ,      sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
      from   customers c
    ) LOOP
      l_counter := l_counter+i.b;
    END LOOP;
  END LOOP;
  dbms_application_info.set_module(null,null);
  dbms_output.put_line('Executions: '||l_counter);
END;
/
Executions: 555000

PL/SQL procedure successfully completed.

Elapsed: 00:00:09.34

set timi on autotrace off lines 500
column module format a32
column action format a32
select module, action, sql_id, sum(1) ash_Secs
from v$active_Session_History
where module = 'DMK LEVENSHTEIN TEST'
and sample_time >= SYSDATE-1/1440
group by module, action, sql_id
/

MODULE                           ACTION                           SQL_ID          ASH_SECS
-------------------------------- -------------------------------- ------------- ----------
DMK LEVENSHTEIN TEST                                              3391gqpzu5g2k         21
DMK LEVENSHTEIN TEST             levenshtein()                    3391gqpzu5g2k          5
DMK LEVENSHTEIN TEST             dolittle()                       3391gqpzu5g2k          4
NB: I have not been able to get parallelism to work in a PL/SQL function defined in a WITH clause because you cannot specify PARALLEL_ENABLE, and a pragma can only be specified in a package specification.
WITH
  FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE IS
  BEGIN
    RETURN utl_match.edit_distance(UPPER(p1),UPPER(p2));
  END;
select /*+PARALLEL(C 4)*/
       sum(levenshtein(cust_first_name,'STEPHEN')) a
from   customers c
/
ORA-06553: PLS-712: illegal option for subprogram LEVENSHTEIN