Tuesday, June 02, 2020

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 2. Reading from Object Storage with an External Table

This blog is the second in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Create A Credential 

First, I need to create a credential that the database will use to connect to the OCI Object Storage. This is not the same as the credential that the OCI CLI used to connect.
In the OCI interface navigate to Identity ➧ Users ➧ User Details, and create an Authentication Token.
It is important to copy the token at this point because you will not see it again.
Now you can put the token into a database credential.
connect admin/Password2020@gofaster1b_tp 
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'MY_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1z:dq code="" end="" m="">
Note: The visibility of the bucket that I created earlier is private by default. Therefore, I can only access it with an authenticated user. If I were to create a credential for an unauthenticated user, it could only be accessed as public bucket. Otherwise, I would obtain an error.
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-20404: Object not found -
https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-202005
05-1552/o/dba_hist_active_sess_history.txt.gz
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 964
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD_INTERNAL", line 3891
ORA-06512: at line 1

Create an External Table

In my blog Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table, I showed how to create an external table to read a compressed file.  Now I am going to do the same thing as, except that now I am going to read it from OCI Object Storage into an external table created with DBMS_CLOUD.
  • I have to provide a list of columns in the external table and a list of fields in the flat file.
  • N.B. Some column names end in a # symbol.  These must be put in double-quotes in the field list though this is not needed in the column list.
  • The Access Parameters section of the ORACLE_LOADER access driver that I used to create the external table becomes contents the format parameter.  I have created a JSON object to hold the various parameters.  The parameters are not exactly the same, in fact, I have added some.  See also DBMS_CLOUD Package Format Options
DROP TABLE ash_hist PURGE;
BEGIN
   DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name =>'ASH_HIST',
    credential_name =>'MY_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    format => json_object('blankasnull'      value 'true'
                         ,'compression'      value 'gzip'
                         ,'dateformat'       value 'YYYY-MM-DD/HH24:mi:ss'
                         ,'timestampformat'  value 'YYYY-MM-DD/HH24:mi:ss.ff'
                         ,'delimiter'        value '<,>'
                         ,'ignoreblanklines' value 'true'
                         ,'rejectlimit'      value '10'
                         ,'removequotes'     value 'true'
                         ,'trimspaces'       value 'lrtrim'
                         ),
    column_list => 'SNAP_ID NUMBER
,DBID NUMBER
,INSTANCE_NUMBER NUMBER
,SAMPLE_ID NUMBER
,SAMPLE_TIME TIMESTAMP(3)
,SESSION_ID NUMBER
,SESSION_SERIAL# NUMBER
,SESSION_TYPE VARCHAR2(10)
,FLAGS NUMBER
,USER_ID NUMBER
-----------------------------------------
,SQL_ID VARCHAR2(13)
,IS_SQLID_CURRENT VARCHAR2(1)
,SQL_CHILD_NUMBER NUMBER
,SQL_OPCODE NUMBER
,SQL_OPNAME VARCHAR2(64)
,FORCE_MATCHING_SIGNATURE NUMBER
,TOP_LEVEL_SQL_ID VARCHAR2(13)
,TOP_LEVEL_SQL_OPCODE NUMBER
,SQL_PLAN_HASH_VALUE NUMBER
,SQL_FULL_PLAN_HASH_VALUE NUMBER 
-----------------------------------------
,SQL_ADAPTIVE_PLAN_RESOLVED NUMBER
,SQL_PLAN_LINE_ID NUMBER
,SQL_PLAN_OPERATION VARCHAR2(64)
,SQL_PLAN_OPTIONS VARCHAR2(64)
,SQL_EXEC_ID NUMBER
,SQL_EXEC_START DATE
,PLSQL_ENTRY_OBJECT_ID NUMBER
,PLSQL_ENTRY_SUBPROGRAM_ID NUMBER
,PLSQL_OBJECT_ID NUMBER
,PLSQL_SUBPROGRAM_ID NUMBER
-----------------------------------------
,QC_INSTANCE_ID NUMBER
,QC_SESSION_ID NUMBER
,QC_SESSION_SERIAL# NUMBER
,PX_FLAGS NUMBER
,EVENT VARCHAR2(64)
,EVENT_ID NUMBER
,SEQ# NUMBER
,P1TEXT VARCHAR2(64)
,P1 NUMBER
,P2TEXT VARCHAR2(64)
-----------------------------------------
,P2 NUMBER
,P3TEXT VARCHAR2(64)
,P3 NUMBER
,WAIT_CLASS VARCHAR2(64)
,WAIT_CLASS_ID NUMBER
,WAIT_TIME NUMBER
,SESSION_STATE VARCHAR2(7)
,TIME_WAITED NUMBER
,BLOCKING_SESSION_STATUS VARCHAR2(11)
,BLOCKING_SESSION NUMBER
-----------------------------------------
,BLOCKING_SESSION_SERIAL# NUMBER
,BLOCKING_INST_ID NUMBER
,BLOCKING_HANGCHAIN_INFO VARCHAR2(1)
,CURRENT_OBJ# NUMBER
,CURRENT_FILE# NUMBER
,CURRENT_BLOCK# NUMBER
,CURRENT_ROW# NUMBER
,TOP_LEVEL_CALL# NUMBER
,TOP_LEVEL_CALL_NAME VARCHAR2(64)
,CONSUMER_GROUP_ID NUMBER
-----------------------------------------
,XID RAW(8)
,REMOTE_INSTANCE# NUMBER
,TIME_MODEL NUMBER
,IN_CONNECTION_MGMT VARCHAR2(1)
,IN_PARSE VARCHAR2(1)
,IN_HARD_PARSE VARCHAR2(1)
,IN_SQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_RPC VARCHAR2(1)
,IN_PLSQL_COMPILATION VARCHAR2(1)
-----------------------------------------
,IN_JAVA_EXECUTION VARCHAR2(1)
,IN_BIND VARCHAR2(1)
,IN_CURSOR_CLOSE VARCHAR2(1)
,IN_SEQUENCE_LOAD VARCHAR2(1)
,IN_INMEMORY_QUERY VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_POPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_PREPOPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_REPOPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_TREPOPULATE VARCHAR2(1) /*added 12.1*/
,CAPTURE_OVERHEAD VARCHAR2(1)
-----------------------------------------
,REPLAY_OVERHEAD VARCHAR2(1)
,IS_CAPTURED VARCHAR2(1)
,IS_REPLAYED VARCHAR2(1)
,SERVICE_HASH NUMBER
,PROGRAM VARCHAR2(64)
,MODULE VARCHAR2(64)
,ACTION VARCHAR2(64)
,CLIENT_ID VARCHAR2(64)
,MACHINE VARCHAR2(64)
,PORT NUMBER
-----------------------------------------
,ECID VARCHAR2(64)
,DBREPLAY_FILE_ID NUMBER /*added 12.1*/
,DBREPLAY_CALL_COUNTER NUMBER /*added 12.1*/
,TM_DELTA_TIME NUMBER
,TM_DELTA_CPU_TIME NUMBER
,TM_DELTA_DB_TIME NUMBER
,DELTA_TIME NUMBER
,DELTA_READ_IO_REQUESTS NUMBER
,DELTA_WRITE_IO_REQUESTS NUMBER
,DELTA_READ_IO_BYTES NUMBER
-----------------------------------------
,DELTA_WRITE_IO_BYTES NUMBER
,DELTA_INTERCONNECT_IO_BYTES NUMBER
,PGA_ALLOCATED NUMBER
,TEMP_SPACE_ALLOCATED NUMBER
,DBOP_NAME VARCHAR2(64) /*added 12.1*/
,DBOP_EXEC_ID NUMBER /*added 12.1*/
,CON_DBID NUMBER /*added 12.1*/
,CON_ID NUMBER /*added 12.1*/'
-----------------------------------------
,field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START ,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES 
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID'
);
END;
/
This file contains 1.4M rows in a 200Mb compressed file. If uncompressed it would be 4.6Gb. It takes about 81 seconds to perform a full scan on it.
set autotrace on timi on pages 99 lines 160
break on report
compute sum of ash_secs on report
column event format a40
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from ash_hist
--where rownum <= 1000
group by event
order by ash_Secs desc
;
EVENT                                      ASH_SECS MIN(SAMPLE_TIME)       MAX(SAMPLE_TIME)
---------------------------------------- ---------- ---------------------- ----------------------
                                           10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read                            3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client                  269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp                        32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy                              24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804
…
latch: gc element                                10 30-MAR-20 18.42.51.748 30-MAR-20 18.42.51.748
                                         ----------
sum                                        14093050

86 rows selected.

Elapsed: 00:01:21.17
We can see from the plan that it full scanned the external table in parallel.
Execution Plan
----------------------------------------------------------
Plan hash value: 4220750095

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |  8344K|   374M|  1417  (33)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)                 | :TQ10002 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                     |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE                       |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | PCWP |            |
|   5 |      PX SEND RANGE                   | :TQ10001 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | P->P | RANGE      |
|   6 |       HASH GROUP BY                  |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX RECEIVE                    |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | PCWP |            |
|   8 |         PX SEND HASH                 | :TQ10000 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   9 |          HASH GROUP BY               |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,00 | PCWP |            |
|  10 |           PX BLOCK ITERATOR          |          |  8344K|   374M|  1089  (13)| 00:00:01 |  Q1,00 | PCWC |            |
|  11 |            EXTERNAL TABLE ACCESS FULL| ASH_HIST |  8344K|   374M|  1089  (13)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit


Statistics
----------------------------------------------------------
       2617  recursive calls
          3  db block gets
       2751  consistent gets
          0  physical reads
        728  redo size
       5428  bytes sent via SQL*Net to client
        602  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
        346  sorts (memory)
          0  sorts (disk)
         86  rows processed
In the next post, I will explain how to copy the data directly from Object Storage into a regular table.

No comments :