Friday, November 30, 2018

A Brief Look Inside Oracle's Autonomous Data Warehouse Cloud

This post is part of a series that discusses some common issues in data warehouses.
There is lots of documentation for Autonomous Data Warehouse Cloud (ADWC), in which I found this bold claim:
  •  'Additionally, Autonomous Data Warehouse does not require any tuning. Autonomous Data Warehouse is designed as a ''load and go” service: you start the service, define tables, load data, and then run queries. When you use Autonomous Data Warehouse, no tuning is necessary. You do not need to consider any details about parallelism, partitioning, indexing, or compression. The service automatically configures the database for high-performance queries.' - Getting Started with Autonomous Data Warehouse 
I also found these references were helpful to get me going:
The documentation describes how to import data with the DBMS_CLOUD package or with data pump. You have to upload files onto cloud storage and import from there.  For data pump import, the various object types in the exclude parameter recommended in the documentation are a consequence of the 'load and go' approach.
impdp admin/password@ADWC1_high \       
     directory=data_pump_dir \       
     credential=def_cred_name \       
     dumpfile= https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/adwc/adwc_user/export%u.dmp \
     parallel=16 \
     partition_options=merge \ 
     transform=segment_attributes:n \
     transform=dwcs_cvt_iots:y \
     transform=constraint_use_default_index:y \
     exclude=index, cluster, indextype, materialized_view, materialized_view_log, materialized_zonemap, db_link
Clearly, Oracle expects you to periodically incrementally bulk load data and then work with essentially static data.

What is Autonomous Data Warehouse Cloud? 

It is built on Exadata, so it is an engineered system.  This is what I found I was running on:
  • 12.2.0.1.0 Single Instance RAC
  • SGA: 3.3Gb, PGA: 5Gb, RAM: 708Gb
  • Intel(R) Xeon(R) CPU E7-8867 v4 @ 2.40GHz 84 CPUs/Core/Threads
  • 12 storage servers
There are two pre-installed locked sample schemas
  • The sales history schema SH, that I have been using so far, is a bit different to the one you would build with the scripts on Github. 
  • SSB – Star Schema Benchmark.

How is the Oracle delivered Sales History demo schema in ADWC built? 

There are some significant differences with the delivered Sales History schema:
  • Primary key constraints are DISABLED, NOVALIDATE, RELY.  There is corresponding no unique index because the constraint was created DISABLED. 
    • Thus, there is nothing to prevent you loading duplicate data!
  • Foreign key constraints are present, but also DISABLED, NOVALIDATE, RELY.
    • Foreign keys are also not indexed, but as they are not enforced there is no risk of TM locking.
    • Foreign key join elimination can still happen because QUERY_REWRITE_INTEGRITY=TRUSTED
It is clearly not intended that referential integrity is going to be enforced by the database in ADWC. That is not an unreasonable decision in a data warehouse because you expect the data to be validated in the source system.  However, while it is one thing not to enforce referential integrity, I think that not enforcing uniqueness is courageous!  Although, part of the cost of doing so is maintaining a unique index.
I am surprised that the SALES table is not partitioned.  Partition pruning is a very effective way of eliminating data from a query, but perhaps Oracle is relying on the Bloom filtering and the storage indexes to instead.
PCTFREE is set to 0, so that the data is packed into data blocks without leaving room for updates.  This makes good sense.  In a data warehouse, we don't expect to update the data after it has been loaded, or cater also for OLTP behaviour.  This fits with the 'load and go' statement.
The tables have been Hybrid Columnar Compressed (HCC). This also makes sense, because the data is bulk loaded in direct-path mode, and again we don't expect to be updating the data after it has been loaded.  However, Oracle has used QUERY HIGH HCC.  As early as 2016, Roger MacNichol pointed out in Compression in a well-balanced system that “since HCC was released …, DBAs have almost always used Query High” but as CPU speeds have increased “the time has come for a reassessment of role HCC Query Low”.  I have also written about just such a case.

Building My Own Sales History Demo Schema 

I also built my own sales history demo per the standard public scripts. The difference between the two is quite telling. I found that:
  • I can build indexes, not just primary keys and foreign keys (though I understand that has only recently become possible).
  • I can specify tablespaces and storage 
  • I can do some alter session commands but not others due to PDB security profiles.
    • I can't ALTER SESSION SET star_transformation_enabled = TRUE;
    • However, I can set ALTER SESSION SET optimizer_ignore_hints =FALSE, and then I get star transformation by using the STAR_TRANSFORMATION hint. 
  • I can also build B-tree or bitmap secondary indexes. 
 For example, you might choose to index attribute columns on your dimensions.
CREATE UNIQUE INDEX "COUNTRIES_ISO" ON "COUNTRIES" ("COUNTRY_ISO_CODE");
And ADWC will use it.
Plan hash value: 2437708077

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name          | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |               |      1 |        |       |   427 (100)|       |          |       |     45 |00:00:00.11 |    1651 |       |       |          |
|   1 |  SORT GROUP BY                    |               |      1 |    102 |  8874 |   427   (6)| 00:00:01 |       |       |     45 |00:00:00.11 |    1651 |  6144 |  6144 | 6144  (0)|
|*  2 |   HASH JOIN                       |               |      1 |   4210 |   357K|   426   (6)| 00:00:01 |       |       |  64818 |00:00:00.16 |    1651 |  5443K|  5443K| 5260K (0)|
|*  3 |    TABLE ACCESS STORAGE FULL      | PRODUCTS      |      1 |     26 |   208 |     2   (0)| 00:00:01 |       |       |     26 |00:00:00.01 |       7 |  1025K|  1025K|          |
|*  4 |    HASH JOIN                      |               |      1 |  11886 |   916K|   424   (6)| 00:00:01 |       |       |    141K|00:00:00.08 |    1644 |  4266K|  4266K| 4262K (0)|
|   5 |     JOIN FILTER CREATE            | :BF0001       |      1 |    364 |  4368 |     2   (0)| 00:00:01 |       |       |    364 |00:00:00.01 |      16 |       |       |          |
|   6 |      PART JOIN FILTER CREATE      | :BF0000       |      1 |    364 |  4368 |     2   (0)| 00:00:01 |       |       |    364 |00:00:00.01 |      16 |       |       |          |
|*  7 |       TABLE ACCESS STORAGE FULL   | TIMES         |      1 |    364 |  4368 |     2   (0)| 00:00:01 |       |       |    364 |00:00:00.01 |      16 |  1025K|  1025K|          |
|*  8 |     HASH JOIN                     |               |      1 |  48360 |  3164K|   422   (6)| 00:00:01 |       |       |    143K|00:00:00.07 |    1628 |  3376K|  3178K| 4759K (0)|
|   9 |      JOIN FILTER CREATE           | :BF0002       |      1 |   2921 |   128K|    21   (5)| 00:00:01 |       |       |  18520 |00:00:00.02 |     755 |       |       |          |
|  10 |       NESTED LOOPS                |               |      1 |   2921 |   128K|    21   (5)| 00:00:01 |       |       |  18520 |00:00:00.01 |     755 |       |       |          |
|  11 |        TABLE ACCESS BY INDEX ROWID| COUNTRIES     |      1 |      1 |    18 |     1   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |       |       |          |
|* 12 |         INDEX UNIQUE SCAN         | COUNTRIES_ISO |      1 |      1 |       |     0   (0)|       |          |       |      1 |00:00:00.01 |       1 |  1025K|  1025K|          |
|* 13 |        VIEW                       | VW_GBF_25     |      1 |   2921 | 78867 |    20   (5)| 00:00:01 |       |       |  18520 |00:00:00.01 |     753 |       |       |          |
|  14 |         TABLE ACCESS STORAGE FULL | CUSTOMERS     |      1 |  55500 |  1138K|    20   (5)| 00:00:01 |       |       |  55500 |00:00:00.01 |     753 |  1025K|  1025K|          |
|  15 |      JOIN FILTER USE              | :BF0001       |      1 |    918K|    19M|   396   (5)| 00:00:01 |       |       |    158K|00:00:00.04 |     873 |       |       |          |
|  16 |       JOIN FILTER USE             | :BF0002       |      1 |    918K|    19M|   396   (5)| 00:00:01 |       |       |    158K|00:00:00.04 |     873 |       |       |          |
|  17 |        PARTITION RANGE JOIN-FILTER|               |      1 |    918K|    19M|   396   (5)| 00:00:01 |:BF0000|:BF0000|    158K|00:00:00.04 |     873 |       |       |          |
|* 18 |         TABLE ACCESS STORAGE FULL | SALES         |      5 |    918K|    19M|   396   (5)| 00:00:01 |:BF0000|:BF0000|    158K|00:00:00.04 |     873 |  1025K|  1025K|          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Whether I should do any of this is another matter!

Delivered non-default parameters

There are a number of database initialisation parameters set to non-default values that are noteworthy:
  • QUERY_REWRITE_INTEGRITY=TRUSTED: This ties in with making the foreign key constraints RELY, so that foreign key join elimination will still occur.
  • RESULT_CACHE_MODE=FORCE enables the result cache by default for all queries.  Whether this is going to be a significant benefit depends upon your application.
select  /*+ gather_plan_statistics opt_param('result_cache_mode','MANUAL')*/ 
 c.country_name
, u.cust_state_province
,  COUNT(*) num_sales
,  SUM(s.amount_sold) total_amount_sold
from  sales s
,   customers u
,   products p
,  times t
, countries c
WHERE  s.time_id = t.time_id
AND   s.prod_id = p.prod_id
AND   u.cust_id = s.cust_id
AND u.country_id = c.country_id
AND c.country_iso_code = '&&iso_country_code'
AND p.prod_category_id = 205
and  t.fiscal_year = 1999
GROUP BY c.country_name
, u.cust_state_province
ORDER BY 1,2
/
Without the result cache, I get the standard full scan/Bloom filter execution plan.
Plan hash value: 2719715383

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |           |      1 |        |       |    38 (100)|          |     45 |00:00:00.16 |     657 |       |       |          |
|   1 |  SORT GROUP BY                   |           |      1 |    102 |  8262 |    38  (48)| 00:00:01 |     45 |00:00:00.16 |     657 |  6144 |  6144 | 6144  (0)|
|*  2 |   HASH JOIN                      |           |      1 |   3478 |   275K|    37  (46)| 00:00:01 |  64818 |00:00:00.03 |     657 |  5443K|  5443K| 5259K (0)|
|*  3 |    TABLE ACCESS STORAGE FULL     | PRODUCTS  |      1 |     26 |   208 |     2   (0)| 00:00:01 |     26 |00:00:00.01 |       3 |  1025K|  1025K|          |
|*  4 |    HASH JOIN                     |           |      1 |   9819 |   699K|    35  (49)| 00:00:01 |    141K|00:00:00.18 |     654 |  4266K|  4266K| 4289K (0)|
|   5 |     JOIN FILTER CREATE           | :BF0000   |      1 |    364 |  4368 |     2   (0)| 00:00:01 |    364 |00:00:00.01 |       3 |       |       |          |
|*  6 |      TABLE ACCESS STORAGE FULL   | TIMES     |      1 |    364 |  4368 |     2   (0)| 00:00:01 |    364 |00:00:00.01 |       3 |  1025K|  1025K|          |
|*  7 |     HASH JOIN                    |           |      1 |  39950 |  2379K|    33  (52)| 00:00:01 |    141K|00:00:00.18 |     651 |  3411K|  3411K| 4776K (0)|
|   8 |      JOIN FILTER CREATE          | :BF0001   |      1 |   2413 | 94107 |     9  (12)| 00:00:01 |  18520 |00:00:00.01 |     215 |       |       |          |
|*  9 |       HASH JOIN                  |           |      1 |   2413 | 94107 |     9  (12)| 00:00:01 |  18520 |00:00:00.01 |     215 |  2642K|  2642K|  681K (0)|
|  10 |        JOIN FILTER CREATE        | :BF0002   |      1 |      1 |    18 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |       |       |          |
|* 11 |         TABLE ACCESS STORAGE FULL| COUNTRIES |      1 |      1 |    18 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |  1025K|  1025K|          |
|  12 |        JOIN FILTER USE           | :BF0002   |      1 |  55500 |  1138K|     7  (15)| 00:00:01 |  18520 |00:00:00.01 |     212 |       |       |          |
|* 13 |         TABLE ACCESS STORAGE FULL| CUSTOMERS |      1 |  55500 |  1138K|     7  (15)| 00:00:01 |  18520 |00:00:00.01 |     212 |  1025K|  1025K|          |
|  14 |      JOIN FILTER USE             | :BF0000   |      1 |    918K|    19M|    19  (58)| 00:00:01 |    156K|00:00:00.07 |     436 |       |       |          |
|  15 |       JOIN FILTER USE            | :BF0001   |      1 |    918K|    19M|    19  (58)| 00:00:01 |    156K|00:00:00.07 |     436 |       |       |          |
|* 16 |        TABLE ACCESS STORAGE FULL | SALES     |      1 |    918K|    19M|    19  (58)| 00:00:01 |    156K|00:00:00.07 |     436 |  1025K|  1025K|          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Normally result cache is enabled.  Here the query was not executed because the result was in the cache.
Plan hash value: 2719715383

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                       | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                            |      1 |        |       |    38 (100)|          |     45 |00:00:00.01 |
|   1 |  RESULT CACHE                     | 8992dgrw00p4p9zu2vmq8p3nwg |      1 |        |       |            |          |     45 |00:00:00.01 |
|   2 |   SORT GROUP BY                   |                            |      0 |    102 |  8262 |    38  (48)| 00:00:01 |      0 |00:00:00.01 |
|   3 |    HASH JOIN                      |                            |      0 |   3478 |   275K|    37  (46)| 00:00:01 |      0 |00:00:00.01 |
|   4 |     TABLE ACCESS STORAGE FULL     | PRODUCTS                   |      0 |     26 |   208 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |
|   5 |     HASH JOIN                     |                            |      0 |   9819 |   699K|    35  (49)| 00:00:01 |      0 |00:00:00.01 |
|   6 |      JOIN FILTER CREATE           | :BF0000                    |      0 |    364 |  4368 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |
|   7 |       TABLE ACCESS STORAGE FULL   | TIMES                      |      0 |    364 |  4368 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |
|   8 |      HASH JOIN                    |                            |      0 |  39950 |  2379K|    33  (52)| 00:00:01 |      0 |00:00:00.01 |
|   9 |       JOIN FILTER CREATE          | :BF0001                    |      0 |   2413 | 94107 |     9  (12)| 00:00:01 |      0 |00:00:00.01 |
|  10 |        HASH JOIN                  |                            |      0 |   2413 | 94107 |     9  (12)| 00:00:01 |      0 |00:00:00.01 |
|  11 |         JOIN FILTER CREATE        | :BF0002                    |      0 |      1 |    18 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |
|  12 |          TABLE ACCESS STORAGE FULL| COUNTRIES                  |      0 |      1 |    18 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |
|  13 |         JOIN FILTER USE           | :BF0002                    |      0 |  55500 |  1138K|     7  (15)| 00:00:01 |      0 |00:00:00.01 |
|  14 |          TABLE ACCESS STORAGE FULL| CUSTOMERS                  |      0 |  55500 |  1138K|     7  (15)| 00:00:01 |      0 |00:00:00.01 |
|  15 |       JOIN FILTER USE             | :BF0000                    |      0 |    918K|    19M|    19  (58)| 00:00:01 |      0 |00:00:00.01 |
|  16 |        JOIN FILTER USE            | :BF0001                    |      0 |    918K|    19M|    19  (58)| 00:00:01 |      0 |00:00:00.01 |
|  17 |         TABLE ACCESS STORAGE FULL | SALES                      |      0 |    918K|    19M|    19  (58)| 00:00:01 |      0 |00:00:00.01 |
------------------------------------------------------------------------------------------------------------------------------------------------

Parameter
Value
Comment
_default_pct_free
1
Sets the default value for PCT_FREE to 1% (defaults to 10%) in order to pack data in blocks.
_optimizer_gather_stats_on_load_all
TRUE
Statistics gathered on-line during direct-path insert into a table that is not empty.  Otherwise, this only happens on initial load or during a create-table-as-select command.
_optimizer_gather_stats_on_load_hist
TRUE
Histograms also gathered on-line during direct-path load.
optimizer_ignore_hints
TRUE
Ignore hints embedded in SQL.
optimizer_ignore_parallel_hints
TRUE
Ignore embedded parallel hints.
result_cache_max_size
100M
Maximum size of result cache.
result_cache_max_result
1
Maximum percentage of result cache that one result can use (defaults to 5%).
Inmemory_size
1G
Size of In-memory Column Store, so this feature is enabled.
_cell_offload_vector_groupby
FALSE
In-Memory Aggregation optimisation is disabled (see related white paper).

Other useful references:

Summary

Everything that I have discussed in this series applies to ADWC just as it does to other data warehouses.
  • Your data warehouse should certainly use a strict star schema data model such that you can define foreign and either primary or unique keys in the database.  
  • You should only have equality joins between dimension and fact tables.
  • Wherever possible these should be single column joins and therefore single column foreign and primary keys.  You can consider multi-column keys from Oracle 12.2, however, there are bugs.
  • Whether the database enforces the foreign key constraints is a matter of choice, but you need to define them in order to achieve join elimination.
ADWC is built on Engineered systems
  • So full scan/Bloom filter is going to be able to take advantage of Engineer system optimisations.
    • Bloom Filters are pushed to storage server during smart scan
    • A Bloom Filter on one table can be used against the storage index on another table to skip I/O.
    • Hybrid Columnar Compression further assists smart scans.
  • I would not like to work without enforced primary keys, I think that is courageous!
  • If the database is not going to enforce referential integrity, then foreign key should be made RELY.  As QUERY_REWRITE_INTEGRITY=TRUSTED you will still get foreign key join elimination.
  • You are expected to load data by periodically incrementally bulk loading it.
    • Optimiser statistics are maintained automatically in direct path insert even after the initial insert.  The regular statistics maintenance window is disabled.  If you are going to further transform of data after the initial load, then you will need to think about whether and when to collect statistics.
  • RESULT_CACHE_MODE=FORCE, so the result cache is used by default across the board, with a maximum result size to keep things reasonable.
    • This raises the question whether this should be considered for non-autonomous engineered system data warehouses?
    • However, used incorrectly, the result cache can cause contention problems.

Will Anything run on ADWC?

At the end of the day, ADWC is an Oracle database on an engineered system, although it has been set up and configured in a very particular way.  It is possible to alter ADWC settings to get traditional Star Transformation behaviour and do all the things that you usually do to implement a data warehouse.  However, as I have demonstrated earlier in this series, you probably wouldn't want to do that.
If your legacy data warehouse doesn't follow the principles of good practice, including but not limited to those set out above, then you probably should not be trying to force it into ADWC.  Other platforms, where you have more discretion, are available.

No comments :