Wednesday, June 03, 2020

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

This blog is the third in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Copy Data into Table 

Alternatively, we can copy the data into a normal table. The table needs to be created in advance. This time, I am going to run the copy as user SOE rather than ADMIN.  I need to:
  • Grant connect and resource privilege and quota on the data tablespace.
  • Grant execute on DBMS_CLOUD to SOE, so it can execute the command.
  • Grant READ and WRITE access on the DATA_PUMP_DIR directory – the log and bad files created by this process are written to this database directory.
connect admin/Password2020!@gofaster1b_tp 
CREATE USER soe IDENTIFIED BY Password2020;
GRANT CONNECT, RESOURCE TO soe;
GRANT EXECUTE ON DBMS_CLOUD TO soe;
GRANT READ, WRITE ON DIRECTORY data_pump_dir TO soe;
ALTER USER soe QUOTA UNLIMITED ON data;
I am now going to switch to user SOE and create my table.
connect soe/Password2020@gofaster1b_tp
Drop table soe.ash_hist purge;
CREATE TABLE soe.ASH_HIST
   (    SNAP_ID NUMBER,
        DBID NUMBER,
        INSTANCE_NUMBER NUMBER,
        SAMPLE_ID NUMBER,
        SAMPLE_TIME TIMESTAMP (3),
--      SAMPLE_TIME_UTC TIMESTAMP (3),
--      USECS_PER_ROW NUMBER,
        SESSION_ID NUMBER,
        SESSION_SERIAL# NUMBER,
        SESSION_TYPE VARCHAR2(10),
        FLAGS NUMBER,
        USER_ID NUMBER,
-----------------------------------------
        SQL_ID VARCHAR2(13),
        IS_SQLID_CURRENT VARCHAR2(1),
        SQL_CHILD_NUMBER NUMBER,
        SQL_OPCODE NUMBER,
        SQL_OPNAME VARCHAR2(64),
        FORCE_MATCHING_SIGNATURE NUMBER,
        TOP_LEVEL_SQL_ID VARCHAR2(13),
        TOP_LEVEL_SQL_OPCODE NUMBER,
        SQL_PLAN_HASH_VALUE NUMBER,
        SQL_FULL_PLAN_HASH_VALUE NUMBER,
-----------------------------------------
        SQL_ADAPTIVE_PLAN_RESOLVED NUMBER,
        SQL_PLAN_LINE_ID NUMBER,
        SQL_PLAN_OPERATION VARCHAR2(64),
        SQL_PLAN_OPTIONS VARCHAR2(64),
        SQL_EXEC_ID NUMBER,
        SQL_EXEC_START DATE,
        PLSQL_ENTRY_OBJECT_ID NUMBER,
        PLSQL_ENTRY_SUBPROGRAM_ID NUMBER,
        PLSQL_OBJECT_ID NUMBER,
        PLSQL_SUBPROGRAM_ID NUMBER,
-----------------------------------------
        QC_INSTANCE_ID NUMBER,
        QC_SESSION_ID NUMBER,
        QC_SESSION_SERIAL# NUMBER,
        PX_FLAGS NUMBER,
        EVENT VARCHAR2(64),
        EVENT_ID NUMBER,
        SEQ# NUMBER,
        P1TEXT VARCHAR2(64),
        P1 NUMBER,
        P2TEXT VARCHAR2(64),
-----------------------------------------
        P2 NUMBER,
        P3TEXT VARCHAR2(64),
        P3 NUMBER,
        WAIT_CLASS VARCHAR2(64),
        WAIT_CLASS_ID NUMBER,
        WAIT_TIME NUMBER,
        SESSION_STATE VARCHAR2(7),
        TIME_WAITED NUMBER,
        BLOCKING_SESSION_STATUS VARCHAR2(11),
        BLOCKING_SESSION NUMBER,
-----------------------------------------
        BLOCKING_SESSION_SERIAL# NUMBER,
        BLOCKING_INST_ID NUMBER,
        BLOCKING_HANGCHAIN_INFO VARCHAR2(1),
        CURRENT_OBJ# NUMBER,
        CURRENT_FILE# NUMBER,
        CURRENT_BLOCK# NUMBER,
        CURRENT_ROW# NUMBER,
        TOP_LEVEL_CALL# NUMBER,
        TOP_LEVEL_CALL_NAME VARCHAR2(64),
        CONSUMER_GROUP_ID NUMBER,
-----------------------------------------
        XID RAW(8),
        REMOTE_INSTANCE# NUMBER,
        TIME_MODEL NUMBER,
        IN_CONNECTION_MGMT VARCHAR2(1),
        IN_PARSE VARCHAR2(1),
        IN_HARD_PARSE VARCHAR2(1),
        IN_SQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_RPC VARCHAR2(1),
        IN_PLSQL_COMPILATION VARCHAR2(1),
-----------------------------------------
        IN_JAVA_EXECUTION VARCHAR2(1),
        IN_BIND VARCHAR2(1),
        IN_CURSOR_CLOSE VARCHAR2(1),
        IN_SEQUENCE_LOAD VARCHAR2(1),
        IN_INMEMORY_QUERY VARCHAR2(1),
        IN_INMEMORY_POPULATE VARCHAR2(1),
        IN_INMEMORY_PREPOPULATE VARCHAR2(1),
        IN_INMEMORY_REPOPULATE VARCHAR2(1),
        IN_INMEMORY_TREPOPULATE VARCHAR2(1),
--      IN_TABLESPACE_ENCRYPTION VARCHAR2(1),
        CAPTURE_OVERHEAD VARCHAR2(1),
-----------------------------------------
        REPLAY_OVERHEAD VARCHAR2(1),
        IS_CAPTURED VARCHAR2(1),
        IS_REPLAYED VARCHAR2(1),
--      IS_REPLAY_SYNC_TOKEN_HOLDER VARCHAR2(1),
        SERVICE_HASH NUMBER,
        PROGRAM VARCHAR2(64),
        MODULE VARCHAR2(64),
        ACTION VARCHAR2(64),
        CLIENT_ID VARCHAR2(64),
        MACHINE VARCHAR2(64),
        PORT NUMBER,
-----------------------------------------
        ECID VARCHAR2(64),
        DBREPLAY_FILE_ID NUMBER,
        DBREPLAY_CALL_COUNTER NUMBER,
        TM_DELTA_TIME NUMBER,
        TM_DELTA_CPU_TIME NUMBER,
        TM_DELTA_DB_TIME NUMBER,
        DELTA_TIME NUMBER,
        DELTA_READ_IO_REQUESTS NUMBER,
        DELTA_WRITE_IO_REQUESTS NUMBER,
        DELTA_READ_IO_BYTES NUMBER,
-----------------------------------------
        DELTA_WRITE_IO_BYTES NUMBER,
        DELTA_INTERCONNECT_IO_BYTES NUMBER,
        PGA_ALLOCATED NUMBER,
        TEMP_SPACE_ALLOCATED NUMBER,
        DBOP_NAME VARCHAR2(64),
        DBOP_EXEC_ID NUMBER,
        CON_DBID NUMBER,
        CON_ID NUMBER,
-----------------------------------------
        CONSTRAINT ash_hist_pk PRIMARY KEY (dbid, instance_number, snap_id, sample_id, session_id)
   ) 
COMPRESS FOR QUERY LOW
/
As Autonomous Databases run on Exadata, I have also specified Hybrid Columnar Compression (HCC) for this table.
Credentials are specific to the database user.  I have to create an additional credential, for the same cloud user, but owned by SOE.
ALTER SESSION SET nls_date_Format='hh24:mi:ss dd.mm.yyyy';
set serveroutput on timi on
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'SOE_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1Z:dq#88;1m'
  );
END;
/
column owner format a10
column credential_name format a20
column comments format a80
column username format a40
SELECT * FROM dba_credentials;

OWNER      CREDENTIAL_NAME      USERNAME                                 WINDOWS_DOMAIN
---------- -------------------- ---------------------------------------- ------------------------------
COMMENTS                                                                         ENABL
-------------------------------------------------------------------------------- -----
ADMIN      MY_BUCKET            oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE

SOE        SOE_BUCKET           oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE
The COPY_DATA procedure is similar to CREATE_EXTERNAL_TABLE described in the previous post, but it doesn't have a column list. The field names much match the column names. It is sensitive to field names with a trailing #.  These must be enclosed in double-quotes.
TRUNCATE TABLE soe.ash_hist;
DECLARE
  l_operation_id NUMBER;
BEGIN
  DBMS_CLOUD.COPY_DATA(
    table_name =>'ASH_HIST',
    credential_name =>'SOE_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    schema_name => 'SOE',
    format => json_object('blankasnull'      value 'true'
                         ,'compression'      value 'gzip'
                         ,'dateformat'       value 'YYYY-MM-DD/HH24:mi:ss'
                         ,'timestampformat'  value 'YYYY-MM-DD/HH24:mi:ss.ff'
                         ,'delimiter'        value '<,>'
                         ,'ignoreblanklines' value 'true'
                         ,'rejectlimit'      value '10'
                         ,'removequotes'     value 'true'
                         ,'trimspaces'       value 'lrtrim'
                         ),
    field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES 
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID',
    operation_id=>l_operation_id
  );
  dbms_output.put_line('Operation ID:'||l_operation_id||' finished successfully');
EXCEPTION WHEN OTHERS THEN
  dbms_output.put_line('Operation ID:'||l_operation_id||' raised an error');
  RAISE;
END;
/
The copy data takes slightly longer than the query on the external table.
Operation ID:31 finished successfully

PL/SQL procedure successfully completed.

Elapsed: 00:02:01.11
The status of the copy operation is reported in USER_LOAD_OPERATIONS.  This includes the number of rows loaded and the names of external tables that are created for the log and bad files.
set lines 120
column type format a10
column file_uri_list format a64
column start_time format a32
column update_time format a32
column owner_name format a10
column table_name format a10
column partition_name format a10
column subpartition_name format a10
column logfile_table format a15
column badfile_table format a15
column tempext_table format a30
select * from user_load_operations where id = &operation_id;

        ID TYPE              SID    SERIAL# START_TIME                       UPDATE_TIME                      STATUS
---------- ---------- ---------- ---------- -------------------------------- -------------------------------- ---------
OWNER_NAME TABLE_NAME PARTITION_ SUBPARTITI FILE_URI_LIST                                                    ROWS_LOADED
---------- ---------- ---------- ---------- ---------------------------------------------------------------- -----------
LOGFILE_TABLE   BADFILE_TABLE   TEMPEXT_TABLE
--------------- --------------- ------------------------------
        31 COPY            19965      44088 07-MAY-20 17.03.20.328263 +01:00 07-MAY-20 17.05.36.157680 +01:00 COMPLETED
SOE        ASH_HIST                         https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu     1409305
                                            /b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz
COPY$31_LOG     COPY$31_BAD     COPY$Y2R021UKPJ5F75JCMSKL
An external table is temporarily created by the COPY_DATA procedure but is then dropped before the procedure completes.  The bad file is empty because the copy operation succeeded without error, but we can query the copy log.
select * from COPY$31_LOG;

RECORD                                                                                                                  
------------------------------------------------------------------------------------------------------------------------
 LOG file opened at 05/07/20 16:03:21                                                                                   
                                                                                                                        
Total Number of Files=1                                                                                                 
                                                                                                                        
Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        
                                                                                                                        
Log File: COPY$31_105537.log                                                                                            
                                                                                                                        
 LOG file opened at 05/07/20 16:03:21                                                                                   

Total Number of Files=1                                                                                                 

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        

Log File: COPY$31_105537.log                                                                                            
                                                                                                                        
 LOG file opened at 05/07/20 16:03:21                                                                                   

KUP-05014:   Warning: Intra source concurrency disabled because the URLs specified for the Cloud Service map to compressed data.                                                                                                                
                                                                                                                                                                                                                                                
Bad File: COPY$31_105537.bad                                                                                            
                                                                                                                        
Field Definitions for table COPY$Y2R021UKPJ5F75JCMSKL                                                                   
  Record format DELIMITED BY                                                                                            
  Data in file has same endianness as the platform                                                                      
  Rows with all null fields are accepted                                                                                
  Table level NULLIF (Field = BLANKS)                                                                                   
  Fields in Data Source:                                                                                                
                                                                                                                        
    SNAP_ID                         CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    DBID                            CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    INSTANCE_NUMBER                 CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    SAMPLE_ID                       CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
    SAMPLE_TIME                     CHAR (255)                                                                          
      Date datatype TIMESTAMP, date mask YYYY-MM-DD/HH24:mi:ss.ff                                                       
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               
…
    CON_ID                          CHAR (255)                                                                          
      Terminated by "<,>"                                                                                               
      Trim whitespace from left and right                                                                               

Date Cache Statistics for table COPY$Y2R021UKPJ5F75JCMSKL                                                               
  Date conversion cache disabled due to overflow (default size: 1000)                                                   

365 rows selected.
These files are written to the DATA_DUMP_DIR database directory.  We don't have access to the database file system in Autonomous, so Oracle has provided the LIST_FILES procedure in DBMS_CLOUD so that we can see what files are in a directory.
Set pages 99 lines 150
Column object_name format a32
Column created format a32
Column last_modified format a32
Column checksum format a20
SELECT * FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

OBJECT_NAME                           BYTES CHECKSUM             CREATED                          LAST_MODIFIED
-------------------------------- ---------- -------------------- -------------------------------- --------------------------------
…
COPY$31_dflt.log                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_dflt.bad                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_105537.log                    13591                      07-MAY-20 16.03.21.000000 +00:00 07-MAY-20 16.05.35.000000 +00:00
Statistics are automatically collected on the table by the copy process because it was done in direct-path mode.  We can see the number of rows retrieved corresponds with the number of rows imported by the COPY_DATA procedure.
Set pages 99 lines 140
Column owner format a10
Column IM_STAT_UPDATE_TIME format a30
Select * 
from all_tab_statistics
Where table_name = 'ASH_HIST';

OWNER      TABLE_NAME PARTITION_ PARTITION_POSITION SUBPARTITI SUBPARTITION_POSITION OBJECT_TYPE    NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ---------- ------------------ ---------- --------------------- ------------ ---------- ---------- ------------
 AVG_SPACE  CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO IM_IMCU_COUNT
---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- -------------
IM_BLOCK_COUNT IM_STAT_UPDATE_TIME             SCAN_RATE SAMPLE_SIZE LAST_ANALYZED       GLO USE STATT STALE_S SCOPE
-------------- ------------------------------ ---------- ----------- ------------------- --- --- ----- ------- -------
SOE        ASH_HIST                                                                  TABLE           1409305      19426            0
         0          0         486                         0                   0
                                                             1409305 15:16:14 07.05.2020 YES NO        NO      SHARED
I can confirm that the data is compressed because the compression type of every row is type 8 (HCC QUERY LOW).  See also DBMS_COMPRESSION Compression Types
WITH x AS (
select dbms_compression.get_compression_type('SOE', 'ASH_HIST', rowid) ctype
from soe.ash_hist sample (.1))
Select ctype, count(*) From x group by ctype;

     CTYPE   COUNT(*)
---------- ----------
         8      14097
I can find this SQL Statement in the Performance Hub. 
INSERT /*+ append enable_parallel_dml */ INTO "SOE"."ASH_HIST" SELECT * FROM COPY$Y2R021UKPJ5F75JCMSKL
Therefore, the data was queried from the temporary external table into the permanent table, in direct path mode and in parallel.
I can also look at the OCI Performance Hub and see that mode of the time was spent on CPU.  I can see the SQL_ID of the insert statement and the call to the DBMS_CLOUD procedure.
I can drill in further to the exact SQL statement.
When I query the table I get exactly the same data as previously with the external table.
set autotrace on timi on lines 180 trimspool on
break on report
compute sum of ash_secs on report
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from soe.ash_hist
group by event
order by ash_Secs desc
;

EVENT                                                              ASH_SECS MIN(SAMPLE_TIME)       MAX(SAMPLE_TIME)
---------------------------------------------------------------- ---------- ---------------------- ----------------------
                                                                   10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read                                                    3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client                                          269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp                                                32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy                                                      24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

latch free                                                               10 28-MAR-20 20.26.11.307 28-MAR-20 20.26.11.307
                                                                 ----------
sum                                                                14093050

86 rows selected.

Elapsed: 00:00:00.62
I can see that the execution plan is now a single serial full scan of the table.
Execution Plan
----------------------------------------------------------
Plan hash value: 1336681691

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   1 |  SORT ORDER BY              |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   2 |   HASH GROUP BY             |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL| ASH_HIST |  1409K|    22M|  1753   (4)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         11  recursive calls
         13  db block gets
      19255  consistent gets
      19247  physical reads
       2436  redo size
       5428  bytes sent via SQL*Net to client
        602  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         86  rows processed

No comments :