Wednesday, June 03, 2020

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

This blog is the third in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Copy Data into Table 

Alternatively, we can copy the data into a normal table. The table needs to be created in advance. This time, I am going to run the copy as user SOE rather than ADMIN.  I need to:
  • Grant connect and resource privilege and quota on the data tablespace.
  • Grant execute on DBMS_CLOUD to SOE, so it can execute the command.
  • Grant READ and WRITE access on the DATA_PUMP_DIR directory – the log and bad files created by this process are written to this database directory.
connect admin/Password2020!@gofaster1b_tp 
CREATE USER soe IDENTIFIED BY Password2020;
GRANT CONNECT, RESOURCE TO soe;
GRANT EXECUTE ON DBMS_CLOUD TO soe;
GRANT READ, WRITE ON DIRECTORY data_pump_dir TO soe;
ALTER USER soe QUOTA UNLIMITED ON data;
I am now going to switch to user SOE and create my table.
connect soe/Password2020@gofaster1b_tp
Drop table soe.ash_hist purge;
CREATE TABLE soe.ASH_HIST
   (    SNAP_ID NUMBER,
        DBID NUMBER,
        INSTANCE_NUMBER NUMBER,
        SAMPLE_ID NUMBER,
        SAMPLE_TIME TIMESTAMP (3),
--      SAMPLE_TIME_UTC TIMESTAMP (3),
--      USECS_PER_ROW NUMBER,
        SESSION_ID NUMBER,
        SESSION_SERIAL# NUMBER,
        SESSION_TYPE VARCHAR2(10),
        FLAGS NUMBER,
        USER_ID NUMBER,
-----------------------------------------
        SQL_ID VARCHAR2(13),
        IS_SQLID_CURRENT VARCHAR2(1),
        SQL_CHILD_NUMBER NUMBER,
        SQL_OPCODE NUMBER,
        SQL_OPNAME VARCHAR2(64),
        FORCE_MATCHING_SIGNATURE NUMBER,
        TOP_LEVEL_SQL_ID VARCHAR2(13),
        TOP_LEVEL_SQL_OPCODE NUMBER,
        SQL_PLAN_HASH_VALUE NUMBER,
        SQL_FULL_PLAN_HASH_VALUE NUMBER,
-----------------------------------------
        SQL_ADAPTIVE_PLAN_RESOLVED NUMBER,
        SQL_PLAN_LINE_ID NUMBER,
        SQL_PLAN_OPERATION VARCHAR2(64),
        SQL_PLAN_OPTIONS VARCHAR2(64),
        SQL_EXEC_ID NUMBER,
        SQL_EXEC_START DATE,
        PLSQL_ENTRY_OBJECT_ID NUMBER,
        PLSQL_ENTRY_SUBPROGRAM_ID NUMBER,
        PLSQL_OBJECT_ID NUMBER,
        PLSQL_SUBPROGRAM_ID NUMBER,
-----------------------------------------
        QC_INSTANCE_ID NUMBER,
        QC_SESSION_ID NUMBER,
        QC_SESSION_SERIAL# NUMBER,
        PX_FLAGS NUMBER,
        EVENT VARCHAR2(64),
        EVENT_ID NUMBER,
        SEQ# NUMBER,
        P1TEXT VARCHAR2(64),
        P1 NUMBER,
        P2TEXT VARCHAR2(64),
-----------------------------------------
        P2 NUMBER,
        P3TEXT VARCHAR2(64),
        P3 NUMBER,
        WAIT_CLASS VARCHAR2(64),
        WAIT_CLASS_ID NUMBER,
        WAIT_TIME NUMBER,
        SESSION_STATE VARCHAR2(7),
        TIME_WAITED NUMBER,
        BLOCKING_SESSION_STATUS VARCHAR2(11),
        BLOCKING_SESSION NUMBER,
-----------------------------------------
        BLOCKING_SESSION_SERIAL# NUMBER,
        BLOCKING_INST_ID NUMBER,
        BLOCKING_HANGCHAIN_INFO VARCHAR2(1),
        CURRENT_OBJ# NUMBER,
        CURRENT_FILE# NUMBER,
        CURRENT_BLOCK# NUMBER,
        CURRENT_ROW# NUMBER,
        TOP_LEVEL_CALL# NUMBER,
        TOP_LEVEL_CALL_NAME VARCHAR2(64),
        CONSUMER_GROUP_ID NUMBER,
-----------------------------------------
        XID RAW(8),
        REMOTE_INSTANCE# NUMBER,
        TIME_MODEL NUMBER,
        IN_CONNECTION_MGMT VARCHAR2(1),
        IN_PARSE VARCHAR2(1),
        IN_HARD_PARSE VARCHAR2(1),
        IN_SQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_RPC VARCHAR2(1),
        IN_PLSQL_COMPILATION VARCHAR2(1),
-----------------------------------------
        IN_JAVA_EXECUTION VARCHAR2(1),
        IN_BIND VARCHAR2(1),
        IN_CURSOR_CLOSE VARCHAR2(1),
        IN_SEQUENCE_LOAD VARCHAR2(1),
        IN_INMEMORY_QUERY VARCHAR2(1),
        IN_INMEMORY_POPULATE VARCHAR2(1),
        IN_INMEMORY_PREPOPULATE VARCHAR2(1),
        IN_INMEMORY_REPOPULATE VARCHAR2(1),
        IN_INMEMORY_TREPOPULATE VARCHAR2(1),
--      IN_TABLESPACE_ENCRYPTION VARCHAR2(1),
        CAPTURE_OVERHEAD VARCHAR2(1),
-----------------------------------------
        REPLAY_OVERHEAD VARCHAR2(1),
        IS_CAPTURED VARCHAR2(1),
        IS_REPLAYED VARCHAR2(1),
--      IS_REPLAY_SYNC_TOKEN_HOLDER VARCHAR2(1),
        SERVICE_HASH NUMBER,
        PROGRAM VARCHAR2(64),
        MODULE VARCHAR2(64),
        ACTION VARCHAR2(64),
        CLIENT_ID VARCHAR2(64),
        MACHINE VARCHAR2(64),
        PORT NUMBER,
-----------------------------------------
        ECID VARCHAR2(64),
        DBREPLAY_FILE_ID NUMBER,
        DBREPLAY_CALL_COUNTER NUMBER,
        TM_DELTA_TIME NUMBER,
        TM_DELTA_CPU_TIME NUMBER,
        TM_DELTA_DB_TIME NUMBER,
        DELTA_TIME NUMBER,
        DELTA_READ_IO_REQUESTS NUMBER,
        DELTA_WRITE_IO_REQUESTS NUMBER,
        DELTA_READ_IO_BYTES NUMBER,
-----------------------------------------
        DELTA_WRITE_IO_BYTES NUMBER,
        DELTA_INTERCONNECT_IO_BYTES NUMBER,
        PGA_ALLOCATED NUMBER,
        TEMP_SPACE_ALLOCATED NUMBER,
        DBOP_NAME VARCHAR2(64),
        DBOP_EXEC_ID NUMBER,
        CON_DBID NUMBER,
        CON_ID NUMBER,
-----------------------------------------
        CONSTRAINT ash_hist_pk PRIMARY KEY (dbid, instance_number, snap_id, sample_id, session_id)
   ) 
COMPRESS FOR QUERY LOW
/
As Autonomous Databases run on Exadata, I have also specified Hybrid Columnar Compression (HCC) for this table.
Credentials are specific to the database user.  I have to create an additional credential, for the same cloud user, but owned by SOE.
ALTER SESSION SET nls_date_Format='hh24:mi:ss dd.mm.yyyy';
set serveroutput on timi on
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'SOE_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1Z:dq#88;1m'
  );
END;
/
column owner format a10
column credential_name format a20
column comments format a80
column username format a40
SELECT * FROM dba_credentials;

OWNER      CREDENTIAL_NAME      USERNAME                                 WINDOWS_DOMAIN
---------- -------------------- ---------------------------------------- ------------------------------
COMMENTS                                                                         ENABL
-------------------------------------------------------------------------------- -----
ADMIN      MY_BUCKET            oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE

SOE        SOE_BUCKET           oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE
The COPY_DATA procedure is similar to CREATE_EXTERNAL_TABLE described in the previous post, but it doesn't have a column list. The field names much match the column names. It is sensitive to field names with a trailing #.  These must be enclosed in double-quotes.
TRUNCATE TABLE soe.ash_hist;
DECLARE
  l_operation_id NUMBER;
BEGIN
  DBMS_CLOUD.COPY_DATA(
    table_name =>'ASH_HIST',
    credential_name =>'SOE_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    schema_name => 'SOE',
    format => json_object('blankasnull'      value 'true'
                         ,'compression'      value 'gzip'
                         ,'dateformat'       value 'YYYY-MM-DD/HH24:mi:ss'
                         ,'timestampformat'  value 'YYYY-MM-DD/HH24:mi:ss.ff'
                         ,'delimiter'        value '<,>'
                         ,'ignoreblanklines' value 'true'
                         ,'rejectlimit'      value '10'
                         ,'removequotes'     value 'true'
                         ,'trimspaces'       value 'lrtrim'
                         ),
    field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES 
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID',
    operation_id=>l_operation_id
  );
  dbms_output.put_line('Operation ID:'||l_operation_id||' finished successfully');
EXCEPTION WHEN OTHERS THEN
  dbms_output.put_line('Operation ID:'||l_operation_id||' raised an error');
  RAISE;
END;
/
The copy data takes slightly longer than the query on the external table.
Operation ID:31 finished successfully

PL/SQL procedure successfully completed.

Elapsed: 00:02:01.11
The status of the copy operation is reported in USER_LOAD_OPERATIONS.  This includes the number of rows loaded and the names of external tables that are created for the log and bad files.
set lines 120
column type format a10
column file_uri_list format a64
column start_time format a32
column update_time format a32
column owner_name format a10
column table_name format a10
column partition_name format a10
column subpartition_name format a10
column logfile_table format a15
column badfile_table format a15
column tempext_table format a30
select * from user_load_operations where id = &operation_id;

        ID TYPE              SID    SERIAL# START_TIME                       UPDATE_TIME                      STATUS
---------- ---------- ---------- ---------- -------------------------------- -------------------------------- ---------
OWNER_NAME TABLE_NAME PARTITION_ SUBPARTITI FILE_URI_LIST                                                    ROWS_LOADED
---------- ---------- ---------- ---------- ---------------------------------------------------------------- -----------
LOGFILE_TABLE   BADFILE_TABLE   TEMPEXT_TABLE
--------------- --------------- ------------------------------
        31 COPY            19965      44088 07-MAY-20 17.03.20.328263 +01:00 07-MAY-20 17.05.36.157680 +01:00 COMPLETED
SOE        ASH_HIST                         https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu     1409305
                                            /b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz
COPY$31_LOG     COPY$31_BAD     COPY$Y2R021UKPJ5F75JCMSKL
An external table is temporarily created by the COPY_DATA procedure but is then dropped before the procedure completes.  The bad file is empty because the copy operation succeeded without error, but we can query the copy log.
select * from COPY$31_LOG;

RECORD                                                                                                                  
------------------------------------------------------------------------------------------------------------------------
 LOG file opened at 05/07/20 16:03:21                                                                                   
                                                                                                                        
Total Number of Files=1                                                                                                 
                                                                                                                        
Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        
                                                                                                                        
Log File: COPY$31_105537.log                                                                                            
                                                                                                                        
 LOG file opened at 05/07/20 16:03:21                                                                                   

Total Number of Files=1                                                                                                 

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        

Log File: COPY$31_105537.log                                                                                            
                                                                                                                        
 LOG file opened at 05/07/20 16:03:21                                                                                   

KUP-05014:   Warning: Intra source concurrency disabled because the URLs specified for the Cloud Service map to compressed data.                                                                                                                
                                                                                                                                                                                                                                                
Bad File: COPY$31_105537.bad                                                                                            
                                                                                                                        
Field Definitions for table COPY$Y2R021UKPJ5F75JCMSKL                                                                   
  Record format DELIMITED BY                                                                                            
  Data in file has same endianness as the platform                                                                      
  Rows with all null fields are accepted                                                                                
  Table level NULLIF (Field = BLANKS)                                                                                   
  Fields in Data Source:                                                                                                
                                                                                                                        
    SNAP_ID                         CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    DBID                            CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    INSTANCE_NUMBER                 CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    SAMPLE_ID                       CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    SAMPLE_TIME                     CHAR (255)                                                                          
      Date datatype TIMESTAMP, date mask YYYY-MM-DD/HH24:mi:ss.ff                                                       
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
…
    CON_ID                          CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               

Date Cache Statistics for table COPY$Y2R021UKPJ5F75JCMSKL                                                               
  Date conversion cache disabled due to overflow (default size: 1000)                                                   

365 rows selected.
These files are written to the DATA_DUMP_DIR database directory.  We don't have access to the database file system in Autonomous, so Oracle has provided the LIST_FILES procedure in DBMS_CLOUD so that we can see what files are in a directory.
Set pages 99 lines 150
Column object_name format a32
Column created format a32
Column last_modified format a32
Column checksum format a20
SELECT * FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

OBJECT_NAME                           BYTES CHECKSUM             CREATED                          LAST_MODIFIED
-------------------------------- ---------- -------------------- -------------------------------- --------------------------------
…
COPY$31_dflt.log                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_dflt.bad                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_105537.log                    13591                      07-MAY-20 16.03.21.000000 +00:00 07-MAY-20 16.05.35.000000 +00:00
Statistics are automatically collected on the table by the copy process because it was done in direct-path mode.  We can see the number of rows retrieved corresponds with the number of rows imported by the COPY_DATA procedure.
Set pages 99 lines 140
Column owner format a10
Column IM_STAT_UPDATE_TIME format a30
Select * 
from all_tab_statistics
Where table_name = 'ASH_HIST';

OWNER      TABLE_NAME PARTITION_ PARTITION_POSITION SUBPARTITI SUBPARTITION_POSITION OBJECT_TYPE    NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ---------- ------------------ ---------- --------------------- ------------ ---------- ---------- ------------
 AVG_SPACE  CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO IM_IMCU_COUNT
---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- -------------
IM_BLOCK_COUNT IM_STAT_UPDATE_TIME             SCAN_RATE SAMPLE_SIZE LAST_ANALYZED       GLO USE STATT STALE_S SCOPE
-------------- ------------------------------ ---------- ----------- ------------------- --- --- ----- ------- -------
SOE        ASH_HIST                                                                  TABLE           1409305      19426            0
         0          0         486                         0                   0
                                                             1409305 15:16:14 07.05.2020 YES NO        NO      SHARED
I can confirm that the data is compressed because the compression type of every row is type 8 (HCC QUERY LOW).  See also DBMS_COMPRESSION Compression Types
WITH x AS (
select dbms_compression.get_compression_type('SOE', 'ASH_HIST', rowid) ctype
from soe.ash_hist sample (.1))
Select ctype, count(*) From x group by ctype;

     CTYPE   COUNT(*)
---------- ----------
         8      14097
I can find this SQL Statement in the Performance Hub. 
INSERT /*+ append enable_parallel_dml */ INTO "SOE"."ASH_HIST" SELECT * FROM COPY$Y2R021UKPJ5F75JCMSKL
Therefore, the data was queried from the temporary external table into the permanent table, in direct path mode and in parallel.
I can also look at the OCI Performance Hub and see that mode of the time was spent on CPU.  I can see the SQL_ID of the insert statement and the call to the DBMS_CLOUD procedure.
I can drill in further to the exact SQL statement.
When I query the table I get exactly the same data as previously with the external table.
set autotrace on timi on lines 180 trimspool on
break on report
compute sum of ash_secs on report
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from soe.ash_hist
group by event
order by ash_Secs desc
;

EVENT                                                              ASH_SECS MIN(SAMPLE_TIME)       MAX(SAMPLE_TIME)
---------------------------------------------------------------- ---------- ---------------------- ----------------------
                                                                   10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read                                                    3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client                                          269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp                                                32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy                                                      24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

latch free                                                               10 28-MAR-20 20.26.11.307 28-MAR-20 20.26.11.307
                                                                 ----------
sum                                                                14093050

86 rows selected.

Elapsed: 00:00:00.62
I can see that the execution plan is now a single serial full scan of the table.
Execution Plan
----------------------------------------------------------
Plan hash value: 1336681691

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   1 |  SORT ORDER BY              |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   2 |   HASH GROUP BY             |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL| ASH_HIST |  1409K|    22M|  1753   (4)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         11  recursive calls
         13  db block gets
      19255  consistent gets
      19247  physical reads
       2436  redo size
       5428  bytes sent via SQL*Net to client
        602  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         86  rows processed

Tuesday, June 02, 2020

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 2. Reading from Object Storage with an External Table

This blog is the second in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Create A Credential 

First, I need to create a credential that the database will use to connect to the OCI Object Storage. This is not the same as the credential that the OCI CLI used to connect.
In the OCI interface navigate to Identity ➧ Users ➧ User Details, and create an Authentication Token.
It is important to copy the token at this point because you will not see it again.
Now you can put the token into a database credential.
connect admin/Password2020@gofaster1b_tp 
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'MY_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1z:dq code="" end="" m="">
Note: The visibility of the bucket that I created earlier is private by default. Therefore, I can only access it with an authenticated user. If I were to create a credential for an unauthenticated user, it could only be accessed as public bucket. Otherwise, I would obtain an error.
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-20404: Object not found -
https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-202005
05-1552/o/dba_hist_active_sess_history.txt.gz
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 964
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD_INTERNAL", line 3891
ORA-06512: at line 1

Create an External Table

In my blog Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table, I showed how to create an external table to read a compressed file.  Now I am going to do the same thing as, except that now I am going to read it from OCI Object Storage into an external table created with DBMS_CLOUD.
  • I have to provide a list of columns in the external table and a list of fields in the flat file.
  • N.B. Some column names end in a # symbol.  These must be put in double-quotes in the field list though this is not needed in the column list.
  • The Access Parameters section of the ORACLE_LOADER access driver that I used to create the external table becomes contents the format parameter.  I have created a JSON object to hold the various parameters.  The parameters are not exactly the same, in fact, I have added some.  See also DBMS_CLOUD Package Format Options
DROP TABLE ash_hist PURGE;
BEGIN
   DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name =>'ASH_HIST',
    credential_name =>'MY_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    format => json_object('blankasnull'      value 'true'
                         ,'compression'      value 'gzip'
                         ,'dateformat'       value 'YYYY-MM-DD/HH24:mi:ss'
                         ,'timestampformat'  value 'YYYY-MM-DD/HH24:mi:ss.ff'
                         ,'delimiter'        value '<,>'
                         ,'ignoreblanklines' value 'true'
                         ,'rejectlimit'      value '10'
                         ,'removequotes'     value 'true'
                         ,'trimspaces'       value 'lrtrim'
                         ),
    column_list => 'SNAP_ID NUMBER
,DBID NUMBER
,INSTANCE_NUMBER NUMBER
,SAMPLE_ID NUMBER
,SAMPLE_TIME TIMESTAMP(3)
,SESSION_ID NUMBER
,SESSION_SERIAL# NUMBER
,SESSION_TYPE VARCHAR2(10)
,FLAGS NUMBER
,USER_ID NUMBER
-----------------------------------------
,SQL_ID VARCHAR2(13)
,IS_SQLID_CURRENT VARCHAR2(1)
,SQL_CHILD_NUMBER NUMBER
,SQL_OPCODE NUMBER
,SQL_OPNAME VARCHAR2(64)
,FORCE_MATCHING_SIGNATURE NUMBER
,TOP_LEVEL_SQL_ID VARCHAR2(13)
,TOP_LEVEL_SQL_OPCODE NUMBER
,SQL_PLAN_HASH_VALUE NUMBER
,SQL_FULL_PLAN_HASH_VALUE NUMBER 
-----------------------------------------
,SQL_ADAPTIVE_PLAN_RESOLVED NUMBER
,SQL_PLAN_LINE_ID NUMBER
,SQL_PLAN_OPERATION VARCHAR2(64)
,SQL_PLAN_OPTIONS VARCHAR2(64)
,SQL_EXEC_ID NUMBER
,SQL_EXEC_START DATE
,PLSQL_ENTRY_OBJECT_ID NUMBER
,PLSQL_ENTRY_SUBPROGRAM_ID NUMBER
,PLSQL_OBJECT_ID NUMBER
,PLSQL_SUBPROGRAM_ID NUMBER
-----------------------------------------
,QC_INSTANCE_ID NUMBER
,QC_SESSION_ID NUMBER
,QC_SESSION_SERIAL# NUMBER
,PX_FLAGS NUMBER
,EVENT VARCHAR2(64)
,EVENT_ID NUMBER
,SEQ# NUMBER
,P1TEXT VARCHAR2(64)
,P1 NUMBER
,P2TEXT VARCHAR2(64)
-----------------------------------------
,P2 NUMBER
,P3TEXT VARCHAR2(64)
,P3 NUMBER
,WAIT_CLASS VARCHAR2(64)
,WAIT_CLASS_ID NUMBER
,WAIT_TIME NUMBER
,SESSION_STATE VARCHAR2(7)
,TIME_WAITED NUMBER
,BLOCKING_SESSION_STATUS VARCHAR2(11)
,BLOCKING_SESSION NUMBER
-----------------------------------------
,BLOCKING_SESSION_SERIAL# NUMBER
,BLOCKING_INST_ID NUMBER
,BLOCKING_HANGCHAIN_INFO VARCHAR2(1)
,CURRENT_OBJ# NUMBER
,CURRENT_FILE# NUMBER
,CURRENT_BLOCK# NUMBER
,CURRENT_ROW# NUMBER
,TOP_LEVEL_CALL# NUMBER
,TOP_LEVEL_CALL_NAME VARCHAR2(64)
,CONSUMER_GROUP_ID NUMBER
-----------------------------------------
,XID RAW(8)
,REMOTE_INSTANCE# NUMBER
,TIME_MODEL NUMBER
,IN_CONNECTION_MGMT VARCHAR2(1)
,IN_PARSE VARCHAR2(1)
,IN_HARD_PARSE VARCHAR2(1)
,IN_SQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_RPC VARCHAR2(1)
,IN_PLSQL_COMPILATION VARCHAR2(1)
-----------------------------------------
,IN_JAVA_EXECUTION VARCHAR2(1)
,IN_BIND VARCHAR2(1)
,IN_CURSOR_CLOSE VARCHAR2(1)
,IN_SEQUENCE_LOAD VARCHAR2(1)
,IN_INMEMORY_QUERY VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_POPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_PREPOPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_REPOPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_TREPOPULATE VARCHAR2(1) /*added 12.1*/
,CAPTURE_OVERHEAD VARCHAR2(1)
-----------------------------------------
,REPLAY_OVERHEAD VARCHAR2(1)
,IS_CAPTURED VARCHAR2(1)
,IS_REPLAYED VARCHAR2(1)
,SERVICE_HASH NUMBER
,PROGRAM VARCHAR2(64)
,MODULE VARCHAR2(64)
,ACTION VARCHAR2(64)
,CLIENT_ID VARCHAR2(64)
,MACHINE VARCHAR2(64)
,PORT NUMBER
-----------------------------------------
,ECID VARCHAR2(64)
,DBREPLAY_FILE_ID NUMBER /*added 12.1*/
,DBREPLAY_CALL_COUNTER NUMBER /*added 12.1*/
,TM_DELTA_TIME NUMBER
,TM_DELTA_CPU_TIME NUMBER
,TM_DELTA_DB_TIME NUMBER
,DELTA_TIME NUMBER
,DELTA_READ_IO_REQUESTS NUMBER
,DELTA_WRITE_IO_REQUESTS NUMBER
,DELTA_READ_IO_BYTES NUMBER
-----------------------------------------
,DELTA_WRITE_IO_BYTES NUMBER
,DELTA_INTERCONNECT_IO_BYTES NUMBER
,PGA_ALLOCATED NUMBER
,TEMP_SPACE_ALLOCATED NUMBER
,DBOP_NAME VARCHAR2(64) /*added 12.1*/
,DBOP_EXEC_ID NUMBER /*added 12.1*/
,CON_DBID NUMBER /*added 12.1*/
,CON_ID NUMBER /*added 12.1*/'
-----------------------------------------
,field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START ,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES 
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID'
);
END;
/
This file contains 1.4M rows in a 200Mb compressed file. If uncompressed it would be 4.6Gb. It takes about 81 seconds to perform a full scan on it.
set autotrace on timi on pages 99 lines 160
break on report
compute sum of ash_secs on report
column event format a40
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from ash_hist
--where rownum <= 1000
group by event
order by ash_Secs desc
;
EVENT                                      ASH_SECS MIN(SAMPLE_TIME)       MAX(SAMPLE_TIME)
---------------------------------------- ---------- ---------------------- ----------------------
                                           10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read                            3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client                  269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp                        32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy                              24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804
…
latch: gc element                                10 30-MAR-20 18.42.51.748 30-MAR-20 18.42.51.748
                                         ----------
sum                                        14093050

86 rows selected.

Elapsed: 00:01:21.17
We can see from the plan that it full scanned the external table in parallel.
Execution Plan
----------------------------------------------------------
Plan hash value: 4220750095

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |  8344K|   374M|  1417  (33)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)                 | :TQ10002 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                     |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE                       |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | PCWP |            |
|   5 |      PX SEND RANGE                   | :TQ10001 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | P->P | RANGE      |
|   6 |       HASH GROUP BY                  |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX RECEIVE                    |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | PCWP |            |
|   8 |         PX SEND HASH                 | :TQ10000 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   9 |          HASH GROUP BY               |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,00 | PCWP |            |
|  10 |           PX BLOCK ITERATOR          |          |  8344K|   374M|  1089  (13)| 00:00:01 |  Q1,00 | PCWC |            |
|  11 |            EXTERNAL TABLE ACCESS FULL| ASH_HIST |  8344K|   374M|  1089  (13)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit


Statistics
----------------------------------------------------------
       2617  recursive calls
          3  db block gets
       2751  consistent gets
          0  physical reads
        728  redo size
       5428  bytes sent via SQL*Net to client
        602  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
        346  sorts (memory)
          0  sorts (disk)
         86  rows processed
In the next post, I will explain how to copy the data directly from Object Storage into a regular table.

Monday, June 01, 2020

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 1. Upload to Object Storage

This blog is the first in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.
Last year I wrote a blog titled Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table.  I set myself the challenge of doing the same thing with an Autonomous database.  I would imagine that these techniques are commonly used Oracle Cloud operations, yet I found the documentation was spread over a number of places, and it took me a while to get it right. So, I hope you find this series helpful.

Install OCI

I could just upload my data file through the browser directly into an object storage bucket, but I don't want to copy it to a Windows desktop.  That is not a good option for very large files.  Instead, I am going to install the OCI Command Line Interface onto the Linux VM where my data file resides (see OCI CLI Quickstart Guide).
I am installing this into the oracle user on a Linux VM where the Oracle database has previously been installed, so I just accepted all the defaults.
bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"

Set up Token-Based Authentication for OCI

I couldn't get the instructions for generating a token without a browser to work.  Instead, installed OCI on a Windows machine and generated a token there and transferred it to my Linux VM (see Token-based Authentication for the CLI).
C:\Users\david.kurtz>oci session authenticate
Enter a region (e.g. ap-mumbai-1, ap-seoul-1, ap-sydney-1, ap-tokyo-1, ca-toronto-1, eu-frankfurt-1, eu-zurich-1, sa-saopaulo-1, uk-london-1, us-ashburn-1, us-gov-ashburn-1, us-gov-chicago-1, us-gov-phoenix-1, us-langley-1, us-luke-1, us-phoenix-1): uk-london-1
    Please switch to newly opened browser window to log in!
    Completed browser authentication process!
Config written to: C:\Users\david.kurtz\.oci\config

    Try out your newly created session credentials with the following example command:

    oci iam region list --config-file C:\Users\david.kurtz\.oci\config --profile DEFAULT --auth security_token
If I run the suggested example command, I get this response with the list of OCI regions.
{
  "data": [
…
    {
      "key": "LHR",
      "name": "uk-london-1"
    },
…
  ]
}

Export OCI Profile

Now I can export the profile to a zip file
C:\Users\david.kurtz>oci session export --profile DEFAULT --output-file DEFAULT
File DEFAULT.zip already exists, do you want to overwrite it? [y/N]: y
Exporting profile: DEFAULT from config file: C:\Users\david.kurtz\.oci\config
Export file written to: C:\Users\david.kurtz\DEFAULT.zip

Import OCI Profile

I can transfer this zip file to my Linux VM and import it.
[oracle@oracle-database .oci]$ oci session import --session-archive ./DEFAULT.zip --force
Config already contains a profile with the same name as the archived profile: DEFAULT. Provide an alternative name for the imported profile: myprofile
Imported profile myprofile written to: /home/oracle/.oci/config

    Try out your newly imported session credentials with the following example command:

    oci iam region list --config-file /home/oracle/.oci/config --profile myprofile --auth security_token
I can test it by again getting the list of OCI regions.

Upload a File

I have created a bucket on OCI.
I could upload a file through the OCI web interface, but I want to use a command-line from my Linux VM
[oracle@oracle-database ~]$ oci os object put --bucket-name bucket-20200505-1552 --file /media/sf_temp/dba_hist_active_sess_history.txt.gz --disable-parallel-uploads --config-file /home/oracle/.oci/config --profile myprofile --auth security_token
Upload ID: 1ad452f7-ab49-a24b-2fe9-f55f565cdf40
Split file into 2 parts for upload.
Uploading object  [####################################]  100%
{
  "etag": "66681c40-4e11-4b73-baf9-cc1e4c3ebd5f",
  "last-modified": "Wed, 06 May 2020 15:17:03 GMT",
  "opc-multipart-md5": "MFdfU7vGZlJ5Mb4nopxtpw==-2"
}
I can see the file in the bucket via the web interface, and I can see that the size and the MD5 checksum are both correct.
In the next post, I will explain how to read the file from Object Storage using an External Table.