Thursday, November 15, 2018

Data Warehouse Design Mistakes 2: Effective Dating Dimensions

This post is part of a series that discusses some common issues in data warehouses.

I have seen situations where some dimensions are effective-dated.  That is to say that there are multiple rows in the dimension table for the same main dimension id, but for different date ranges.  At least one of the date columns has to become part of the primary key.
The dimensions in the Sales History sample schema have got effective from and to dates, but these columns are not part of the primary key.  Let's imagine that the products get updated every year and a new version of the product is sold, requiring a new dimension row.

The Wrong Way

I am going to build a copy of the PRODUCTS table with a new effective dated row for each calendar year in which the product was sold.  The structure and indexing of the BAD_PROUCTS dimension table is unchanged, except that the primary key is not just on PROD_ID but now additionally includes PROD_EFF_FROM
CREATE TABLE bad_products (
    prod_id                     NUMBER(6)       NOT NULL,
    prod_name                   VARCHAR2(50)    NOT NULL,
    prod_desc                   VARCHAR2(4000)  NOT NULL,
    prod_subcategory            VARCHAR2(50)    NOT NULL,
    prod_subcategory_id         NUMBER          NOT NULL,
    prod_subcategory_desc       VARCHAR2(2000)  NOT NULL,
    prod_category               VARCHAR2(50)    NOT NULL,
    prod_category_id            NUMBER          NOT NULL,
    prod_category_desc          VARCHAR2(2000)  NOT NULL,
    prod_weight_class           NUMBER(3)       NOT NULL,
    prod_unit_of_measure        VARCHAR2(20)    ,
    prod_pack_size              VARCHAR2(30)    NOT NULL,
    supplier_id                 NUMBER(6)       NOT NULL,
    prod_status                 VARCHAR2(20)    NOT NULL,
    prod_list_price             NUMBER(8,2)     NOT NULL,
    prod_min_price              NUMBER(8,2)     NOT NULL,
    prod_total                  VARCHAR2(13)    NOT NULL,
    prod_total_id               NUMBER          NOT NULL,
    prod_src_id                 NUMBER          ,
    prod_eff_from               DATE            ,
    prod_eff_to                 DATE            ,
    prod_valid                  VARCHAR2(1)     )
/
ALTER TABLE bad_products
ADD CONSTRAINT bad_products_pk PRIMARY KEY (prod_id, prod_eff_from)
/
Now, I will populate BAD_PRODUCTS by copying PRODUCTS.  I will create rows for each product for each year in which it was sold.  I started with 72 rows in PRODUCTS, but I have 275 rows in BAD_PRODUCTS.
INSERT INTO bad_products
WITH s AS (
SELECT s.prod_id
, TRUNC(s.time_id,'YYYY') time_id
from sales s
GROUP BY s.prod_id, TRUNC(s.time_id,'YYYY')
)
SELECT p.prod_id
,p.prod_name
,p.prod_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_subcategory
,p.prod_subcategory_id
,p.prod_subcategory_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_category
,p.prod_category_id
,p.prod_category_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_weight_class
,p.prod_unit_of_measure
,p.prod_pack_size
,p.supplier_id
,p.prod_status
,p.prod_list_price
,p.prod_min_price
,p.prod_total
,p.prod_total_id
,p.prod_src_id
,s.time_id PROD_EFF_FROM
,ADD_MONTHS(s.time_id,12)-1 PROD_EFF_TO
,p.prod_valid
from products p
,    s
where s.prod_id = p.prod_id
/
Finally, I will build the same indexes as on the PRODUCTS table.
CREATE INDEX bad_products_prod_cat_ix ON bad_products (prod_category);
CREATE BITMAP INDEX bad_products_prod_status_bix ON bad_products (prod_status);
CREATE INDEX bad_products_prod_subcat_ix ON bad_products (prod_subcategory);
I can't now create a foreign key for the product on the SALES table.
I can’t use PROD_ID only because that does not match the primary key.
ALTER TABLE sales
  ADD CONSTRAINT sales_bad_product_fk
  FOREIGN KEY (prod_id) REFERENCES bad_products (prod_id)
;
ERROR at line 3:
ORA-02270: no matching unique or primary key for this column-list
I can't build the foreign key on PROD_ID and TIME_ID, because the TIME_IDs are merely inside the effective range but are not on the PRODUCTS table
ALTER TABLE sales
  ADD CONSTRAINT sales_bad_product_fk
  FOREIGN KEY (prod_id, time_id) REFERENCES bad_products (prod_id, prod_eff_from)
;
ERROR at line 2:
ORA-02298: cannot validate (SH.SALES_BAD_PRODUCT_FK) - parent keys not found
When I come to query the data, I have no choice but to code two inequality conditions on the product table with one of them on a column that is not part of the foreign key.
…
from  sales s
,     bad_products p
,     times t
WHERE s.time_id = t.time_id
AND   s.prod_id = p.prod_id
AND   t.time_id >= p.prod_eff_from 
AND  (t.time_id <= p.prod_eff_to OR p.prod_eff_to IS NULL)
So, even on Oracle 12.2, even if I create the above multi-column constraint NOVALIDATE RELY and set QUERY_REWRITE_INTEGRITY to TRUSTED, I have no chance of being able to eliminate the product table with foreign key join elimination.

The Right Way 

If the product changes and we need a new dimension row for an updated product we will have a new product, with a new product ID, but will with effective dates for that year. Here, I have created an exact copy of the SALES table, called SALES2, but now the product ID is prefixed with the calendar year.
INSERT /*+APPEND*/ INTO sales2
SELECT 
 prod_id+1000*TO_NUMBER(TO_CHAR(time_id,'YYYY')) prod_id
,cust_id
,time_id
,channel_id
,promo_id
,quantity_sold
,amount_sold
FROM sales
/
And similarly, I have created a copy of the PRODUCTS dimension table, called PRODUCTS2 with additional product rows for each year that the product was sold, but again the product ID has been prefixed with the calendar year. The primary key is still just PROD_ID and all the indexes and foreign keys are exactly the same as on the original tables.  Both BAD_PRODUCTS and PRODUCTS2 have the 275 rows.
INSERT INTO products2
WITH s AS (
SELECT s.prod_id
, TRUNC(s.time_id,'YYYY') time_id
FROM sales s
GROUP BY s.prod_id, TRUNC(s.time_id,'YYYY')
)
SELECT p.PROD_ID+(1000*TO_NUMBER(TO_CHAR(s.time_id,'YYYY')))
,p.prod_name
,p.prod_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_subcategory
,p.prod_subcategory_id
,p.prod_subcategory_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_category
,p.prod_category_id
,p.prod_category_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_weight_class
,p.prod_unit_of_measure
,p.prod_pack_size
,p.supplier_id
,p.prod_status
,p.prod_list_price
,p.prod_min_price
,p.prod_total
,p.prod_total_id
,p.prod_src_id
,s.time_id PROD_EFF_FROM
,ADD_MONTHS(s.time_id,12)-1 PROD_EFF_TO
,p.prod_valid
FROM  products p
,     s
WHERE s.prod_id = p.prod_id
/
This is effectively the same query on SALES2 and PRODUCTS2, and it returns the same results. I have kept the inequality criteria on the production effective dates with both SALES and TIMES, although they are not needed here. However, we will see that the way the tables are structured and accessed will have a cost implication.
This time I want to analyse software sales (product category 205) for the USA by fiscal year.
SELECT c.country_name
, p.prod_category_id
, p.prod_category
, t.fiscal_year
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
FROM  sales2 s
,     customers u
,     products2 p
,     times t
,     countries c
WHERE s.time_id = t.time_id
AND   s.prod_id = p.prod_id
AND   u.cust_id = s.cust_id
AND   u.country_id = c.country_id
AND   c.country_iso_code = 'US'
AND   p.prod_category_id = 205
AND   t.time_id >= p.prod_eff_from 
AND  (t.time_id <= p.prod_eff_to OR p.prod_eff_to IS NULL)
AND   s.time_id >= p.prod_eff_from 
AND  (s.time_id <= p.prod_eff_to OR p.prod_eff_to IS NULL)
GROUP BY c.country_name
, p.prod_category_id
, p.prod_category
, t.fiscal_year
ORDER BY 1,2
/

With Query Transformation

First the wrong way with the BAD_PROUCTS table
Plan hash value: 2522704620

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                        | Name                         | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |                              |      1 |        |       |  2000 (100)|          |       |       |      5 |00:00:06.60 |     305K|       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION                       |                              |      1 |        |       |            |          |       |       |      5 |00:00:06.60 |     305K|       |       |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)        | SYS_TEMP_0FD9D7230_A4BC21    |      1 |        |       |            |          |       |       |      0 |00:00:00.08 |    1525 |  1024 |  1024 |          |
|*  3 |    HASH JOIN                                     |                              |      1 |   2921 | 81788 |   418   (1)| 00:00:01 |       |       |  18520 |00:00:00.05 |    1524 |  1185K|  1185K|  721K (0)|
|*  4 |     TABLE ACCESS FULL                            | COUNTRIES                    |      1 |      1 |    18 |     2   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |       |       |          |
|   5 |     TABLE ACCESS FULL                            | CUSTOMERS                    |      1 |  55500 |   541K|   416   (1)| 00:00:01 |       |       |  55500 |00:00:00.02 |    1521 |       |       |          |
|   6 |   SORT GROUP BY                                  |                              |      1 |      5 |   495 |  1582   (1)| 00:00:01 |       |       |      5 |00:00:06.52 |     303K|  2048 |  2048 | 2048  (0)|
|*  7 |    HASH JOIN                                     |                              |      1 |  11568 |  1118K|  1580   (1)| 00:00:01 |       |       |    237K|00:00:06.29 |     303K|  2290K|  1666K| 2158K (0)|
|   8 |     TABLE ACCESS FULL                            | SYS_TEMP_0FD9D7230_A4BC21    |      1 |   2921 | 43815 |     4   (0)| 00:00:01 |       |       |  18520 |00:00:00.01 |       0 |       |       |          |
|*  9 |     HASH JOIN                                    |                              |      1 |  11568 |   948K|  1576   (1)| 00:00:01 |       |       |    237K|00:00:06.18 |     303K|  1744K|  1744K| 1704K (0)|
|  10 |      TABLE ACCESS FULL                           | TIMES                        |      1 |   1826 | 21912 |    16   (0)| 00:00:01 |       |       |   1826 |00:00:00.01 |      55 |       |       |          |
|* 11 |      HASH JOIN                                   |                              |      1 |  30764 |  2163K|  1559   (1)| 00:00:01 |       |       |    237K|00:00:06.09 |     303K|  1115K|  1115K| 1352K (0)|
|* 12 |       TABLE ACCESS FULL                          | BAD_PRODUCTS                 |      1 |    102 |  4284 |     5   (0)| 00:00:01 |       |       |    102 |00:00:00.01 |      14 |       |       |          |
|  13 |       VIEW                                       | VW_ST_F7251F01               |      1 |  48359 |  1416K|  1554   (1)| 00:00:01 |       |       |    237K|00:00:05.02 |     303K|       |       |          |
|  14 |        NESTED LOOPS                              |                              |      1 |  48359 |  2361K|  1526   (1)| 00:00:01 |       |       |    237K|00:00:04.82 |     303K|       |       |          |
|  15 |         PARTITION RANGE ALL                      |                              |      1 |  48359 |   991K|   397   (2)| 00:00:01 |     1 |    28 |    237K|00:00:03.36 |     302K|       |       |          |
|  16 |          BITMAP CONVERSION TO ROWIDS             |                              |     28 |  48359 |   991K|   397   (2)| 00:00:01 |       |       |    237K|00:00:03.15 |     302K|       |       |          |
|  17 |           BITMAP AND                             |                              |     28 |        |       |            |          |       |       |     16 |00:00:03.01 |     302K|       |       |          |
|  18 |            BITMAP MERGE                          |                              |     28 |        |       |            |          |       |       |     16 |00:00:00.04 |    1819 |  1024K|   512K|36864  (0)|
|  19 |             BITMAP KEY ITERATION                 |                              |     28 |        |       |            |          |       |       |   1572 |00:00:00.03 |    1819 |       |       |          |
|  20 |              BUFFER SORT                         |                              |     28 |        |       |            |          |       |       |   2856 |00:00:00.01 |       9 | 73728 | 73728 |          |
|* 21 |               TABLE ACCESS BY INDEX ROWID BATCHED| BAD_PRODUCTS                 |      1 |    102 |   816 |    24   (0)| 00:00:01 |       |       |    102 |00:00:00.01 |       9 |       |       |          |
|  22 |                BITMAP CONVERSION TO ROWIDS       |                              |      1 |        |       |            |          |       |       |    275 |00:00:00.01 |       1 |       |       |          |
|  23 |                 BITMAP INDEX FULL SCAN           | BAD_PRODUCTS_PROD_STATUS_BIX |      1 |        |       |            |          |       |       |      1 |00:00:00.01 |       1 |       |       |          |
|* 24 |              BITMAP INDEX RANGE SCAN             | SALES_PROD_BIX               |   2856 |        |       |            |          |     1 |    28 |   1572 |00:00:00.02 |    1810 |       |       |          |
|  25 |            BITMAP MERGE                          |                              |     28 |        |       |            |          |       |       |     16 |00:00:02.97 |     301K|  8316K|  1039K|  317K (0)|
|  26 |             BITMAP KEY ITERATION                 |                              |     28 |        |       |            |          |       |       |  19186 |00:00:02.85 |     301K|       |       |          |
|  27 |              BUFFER SORT                         |                              |     28 |        |       |            |          |       |       |    296K|00:00:00.42 |       0 |    26M|  1871K|  865K (0)|
|  28 |               TABLE ACCESS FULL                  | SYS_TEMP_0FD9D7230_A4BC21    |      5 |   2921 | 14605 |     4   (0)| 00:00:01 |       |       |  18520 |00:00:00.01 |       0 |       |       |          |
|* 29 |              BITMAP INDEX RANGE SCAN             | SALES_CUST_BIX               |    296K|        |       |            |          |     1 |    28 |  19186 |00:00:01.97 |     301K|       |       |          |
|  30 |         TABLE ACCESS BY USER ROWID               | SALES                        |    237K|      1 |    29 |  1157   (1)| 00:00:01 | ROWID | ROWID |    237K|00:00:00.92 |     829 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
…
  11 - access("ITEM_3"="P"."PROD_ID")
       filter(("ITEM_1">="P"."PROD_EFF_FROM" AND ("ITEM_1"<="P"."PROD_EFF_TO" OR "P"."PROD_EFF_TO" IS NULL)))
…
And now the right way with PRODUCTS2. At first glance the plans are not very different, both plans do a star transformation, but the devil is in the detail.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                        | Name                      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |                           |      1 |        |       |  1207 (100)|          |       |       |      5 |00:00:05.08 |     304K|       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION                       |                           |      1 |        |       |            |          |       |       |      5 |00:00:05.08 |     304K|       |       |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)        | SYS_TEMP_0FD9D7231_A4BC21 |      1 |        |       |            |          |       |       |      0 |00:00:00.07 |    1525 |  1024 |  1024 |          |
|*  3 |    HASH JOIN                                     |                           |      1 |   2921 | 81788 |   418   (1)| 00:00:01 |       |       |  18520 |00:00:00.05 |    1524 |  1185K|  1185K|  689K (0)|
|*  4 |     TABLE ACCESS FULL                            | COUNTRIES                 |      1 |      1 |    18 |     2   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |       |       |          |
|   5 |     TABLE ACCESS FULL                            | CUSTOMERS                 |      1 |  55500 |   541K|   416   (1)| 00:00:01 |       |       |  55500 |00:00:00.02 |    1521 |       |       |          |
|   6 |   SORT GROUP BY                                  |                           |      1 |      5 |   435 |   789   (1)| 00:00:01 |       |       |      5 |00:00:05.01 |     302K|  2048 |  2048 | 2048  (0)|
|*  7 |    HASH JOIN                                     |                           |      1 |  17849 |  1516K|   787   (1)| 00:00:01 |       |       |    237K|00:00:04.83 |     302K|  2290K|  1666K| 2123K (0)|
|   8 |     TABLE ACCESS FULL                            | SYS_TEMP_0FD9D7231_A4BC21 |      1 |   2921 | 43815 |     4   (0)| 00:00:01 |       |       |  18520 |00:00:00.01 |       0 |       |       |          |
|*  9 |     HASH JOIN                                    |                           |      1 |  17849 |  1255K|   782   (1)| 00:00:01 |       |       |    237K|00:00:04.74 |     302K|  1744K|  1744K| 1704K (0)|
|  10 |      TABLE ACCESS FULL                           | TIMES                     |      1 |   1826 | 21912 |    16   (0)| 00:00:01 |       |       |   1826 |00:00:00.01 |      55 |       |       |          |
|* 11 |      HASH JOIN                                   |                           |      1 |  17849 |  1045K|   766   (1)| 00:00:01 |       |       |    237K|00:00:04.68 |     302K|  1298K|  1298K| 1658K (0)|
|* 12 |       TABLE ACCESS FULL                          | PRODUCTS2                 |      1 |    102 |  2856 |     5   (0)| 00:00:01 |       |       |    102 |00:00:00.01 |      14 |       |       |          |
|  13 |       VIEW                                       | VW_ST_F7251F01            |      1 |  17937 |   560K|   761   (1)| 00:00:01 |       |       |    237K|00:00:04.18 |     302K|       |       |          |
|  14 |        NESTED LOOPS                              |                           |      1 |  17937 |   963K|   733   (1)| 00:00:01 |       |       |    237K|00:00:03.99 |     302K|       |       |          |
|  15 |         PARTITION RANGE ALL                      |                           |      1 |  17936 |   402K|   395   (1)| 00:00:01 |     1 |    28 |    237K|00:00:02.76 |     302K|       |       |          |
|  16 |          BITMAP CONVERSION TO ROWIDS             |                           |     28 |  17936 |   402K|   395   (1)| 00:00:01 |       |       |    237K|00:00:02.60 |     302K|       |       |          |
|  17 |           BITMAP AND                             |                           |     28 |        |       |            |          |       |       |     16 |00:00:02.49 |     302K|       |       |          |
|  18 |            BITMAP MERGE                          |                           |     28 |        |       |            |          |       |       |     16 |00:00:00.03 |    1700 |  1024K|   512K| 9216  (0)|
|  19 |             BITMAP KEY ITERATION                 |                           |     28 |        |       |            |          |       |       |    399 |00:00:00.02 |    1700 |       |       |          |
|  20 |              BUFFER SORT                         |                           |     28 |        |       |            |          |       |       |   2856 |00:00:00.01 |       9 | 73728 | 73728 |          |
|* 21 |               TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS2                 |      1 |    102 |  1020 |    24   (0)| 00:00:01 |       |       |    102 |00:00:00.01 |       9 |       |       |          |
|  22 |                BITMAP CONVERSION TO ROWIDS       |                           |      1 |        |       |            |          |       |       |    275 |00:00:00.01 |       1 |       |       |          |
|  23 |                 BITMAP INDEX FULL SCAN           | PRODUCTS2_PROD_STATUS_BIX |      1 |        |       |            |          |       |       |      1 |00:00:00.01 |       1 |       |       |          |
|* 24 |              BITMAP INDEX RANGE SCAN             | SALES2_PROD_BIX           |   2856 |        |       |            |          |     1 |    28 |    399 |00:00:00.01 |    1691 |       |       |          |
|  25 |            BITMAP MERGE                          |                           |     28 |        |       |            |          |       |       |     16 |00:00:02.46 |     300K|  8400K|  1050K|  312K (0)|
|  26 |             BITMAP KEY ITERATION                 |                           |     28 |        |       |            |          |       |       |  19192 |00:00:02.36 |     300K|       |       |          |
|  27 |              BUFFER SORT                         |                           |     28 |        |       |            |          |       |       |    296K|00:00:00.33 |       0 |    26M|  1871K|  865K (0)|
|  28 |               TABLE ACCESS FULL                  | SYS_TEMP_0FD9D7231_A4BC21 |      5 |   2921 | 14605 |     4   (0)| 00:00:01 |       |       |  18520 |00:00:00.01 |       0 |       |       |          |
|* 29 |              BITMAP INDEX RANGE SCAN             | SALES2_CUST_BIX           |    296K|        |       |            |          |     1 |    28 |  19192 |00:00:01.63 |     300K|       |       |          |
|  30 |         TABLE ACCESS BY USER ROWID               | SALES2                    |    237K|      1 |    32 |   366   (1)| 00:00:01 | ROWID | ROWID |    237K|00:00:00.76 |     830 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
…
   9 - access("ITEM_3"="T"."TIME_ID")
  11 - access("ITEM_2"="P"."PROD_ID")
…
The key differences are:
  • At line 24 of both plans. 1572 rows are returned from the SALES_PROD_BIX index because it is returning every effective dated software product, but only 399 rows are returned from SALES2_PROD_BID because the PROD_ID is different for every fiscal year.
  • At line 11, hashing the BAD_PRODUCTS table took 1.07s because it was necessary to evaluate the inequality conditions, whereas it only took 0.50s to hash PRODUCTS2 with a simple equality join. 
  • The optimizer cost with PRODUCTS2 is much lower, and the overall response time is 5.08s instead of 6.60s.

Without Query Transformation 

If I repeat the test without star transformation enabled, I again see differences in the hash joining of the product dimensions. First on BAD_PRODUCTS
Plan hash value: 2324880021

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name         | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |              |      1 |        |       |  1592 (100)|          |       |       |      5 |00:00:01.94 |    5805 |   4128 |       |       |          |
|   1 |  SORT GROUP BY                  |              |      1 |     11 |  1144 |  1592   (3)| 00:00:01 |       |       |      5 |00:00:01.94 |    5805 |   4128 |  2048 |  2048 | 2048  (0)|
|*  2 |   HASH JOIN                     |              |      1 |   9845 |   999K|  1590   (3)| 00:00:01 |       |       |    237K|00:00:01.74 |    5805 |   4128 |  1744K|  1744K| 1620K (0)|
|   3 |    PART JOIN FILTER CREATE      | :BF0000      |      1 |   1826 | 21912 |    16   (0)| 00:00:01 |       |       |   1826 |00:00:00.01 |      55 |      0 |       |       |          |
|   4 |     TABLE ACCESS FULL           | TIMES        |      1 |   1826 | 21912 |    16   (0)| 00:00:01 |       |       |   1826 |00:00:00.01 |      55 |      0 |       |       |          |
|*  5 |    HASH JOIN                    |              |      1 |  26182 |  2352K|  1574   (3)| 00:00:01 |       |       |    237K|00:00:01.65 |    5700 |   4128 |  1115K|  1115K| 1390K (0)|
|*  6 |     TABLE ACCESS FULL           | BAD_PRODUCTS |      1 |    102 |  4284 |     5   (0)| 00:00:01 |       |       |    102 |00:00:00.01 |      14 |      0 |       |       |          |
|*  7 |     HASH JOIN                   |              |      1 |  48360 |  2361K|  1568   (3)| 00:00:01 |       |       |    526K|00:00:01.07 |    5685 |   4128 |  2290K|  1666K| 2216K (0)|
|*  8 |      HASH JOIN                  |              |      1 |   2921 | 81788 |   418   (1)| 00:00:01 |       |       |  18520 |00:00:00.04 |    1524 |      0 |  1236K|  1236K|  771K (0)|
|*  9 |       TABLE ACCESS FULL         | COUNTRIES    |      1 |      1 |    18 |     2   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|  10 |       TABLE ACCESS FULL         | CUSTOMERS    |      1 |  55500 |   541K|   416   (1)| 00:00:01 |       |       |  55500 |00:00:00.02 |    1521 |      0 |       |       |          |
|  11 |      PARTITION RANGE JOIN-FILTER|              |      1 |    918K|    19M|  1142   (3)| 00:00:01 |:BF0000|:BF0000|    918K|00:00:00.77 |    4160 |   4128 |       |       |          |
|  12 |       TABLE ACCESS FULL         | SALES        |     20 |    918K|    19M|  1142   (3)| 00:00:01 |:BF0000|:BF0000|    918K|00:00:00.76 |    4160 |   4128 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."TIME_ID"="T"."TIME_ID")
       filter((("T"."TIME_ID"<="P"."PROD_EFF_TO" OR "P"."PROD_EFF_TO" IS NULL) AND "T"."TIME_ID">="P"."PROD_EFF_FROM"))
   5 - access("S"."PROD_ID"="P"."PROD_ID")
       filter(("S"."TIME_ID">="P"."PROD_EFF_FROM" AND ("S"."TIME_ID"<="P"."PROD_EFF_TO" OR "P"."PROD_EFF_TO" IS NULL)))
And then on PRODUCTS2
Plan hash value: 1297664275

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |           |      1 |        |       |   875 (100)|          |       |       |      5 |00:00:00.92 |    3126 |       |       |          |
|   1 |  SORT GROUP BY                  |           |      1 |     11 |  1012 |   875   (4)| 00:00:01 |       |       |      5 |00:00:00.92 |    3126 |  2048 |  2048 | 2048  (0)|
|*  2 |   HASH JOIN                     |           |      1 |  17849 |  1603K|   873   (4)| 00:00:01 |       |       |    237K|00:00:00.75 |    3126 |  1744K|  1744K| 1621K (0)|
|   3 |    PART JOIN FILTER CREATE      | :BF0000   |      1 |   1826 | 21912 |    16   (0)| 00:00:01 |       |       |   1826 |00:00:00.01 |      55 |       |       |          |
|   4 |     TABLE ACCESS FULL           | TIMES     |      1 |   1826 | 21912 |    16   (0)| 00:00:01 |       |       |   1826 |00:00:00.01 |      55 |       |       |          |
|*  5 |    HASH JOIN                    |           |      1 |  17849 |  1394K|   857   (4)| 00:00:01 |       |       |    237K|00:00:00.70 |    3070 |  1298K|  1298K| 1703K (0)|
|*  6 |     TABLE ACCESS FULL           | PRODUCTS2 |      1 |    102 |  2856 |     5   (0)| 00:00:01 |       |       |    102 |00:00:00.01 |      14 |       |       |          |
|*  7 |     HASH JOIN                   |           |      1 |  48360 |  2455K|   852   (4)| 00:00:01 |       |       |    526K|00:00:00.59 |    3055 |  2290K|  1666K| 2216K (0)|
|*  8 |      HASH JOIN                  |           |      1 |   2921 | 81788 |   418   (1)| 00:00:01 |       |       |  18520 |00:00:00.04 |    1524 |  1236K|  1236K|  774K (0)|
|*  9 |       TABLE ACCESS FULL         | COUNTRIES |      1 |      1 |    18 |     2   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |       |       |          |
|  10 |       TABLE ACCESS FULL         | CUSTOMERS |      1 |  55500 |   541K|   416   (1)| 00:00:01 |       |       |  55500 |00:00:00.03 |    1521 |       |       |          |
|  11 |      PARTITION RANGE JOIN-FILTER|           |      1 |    918K|    21M|   426   (6)| 00:00:01 |:BF0000|:BF0000|    918K|00:00:00.35 |    1530 |       |       |          |
|  12 |       TABLE ACCESS FULL         | SALES2    |     20 |    918K|    21M|   426   (6)| 00:00:01 |:BF0000|:BF0000|    918K|00:00:00.35 |    1530 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  • One difference is where we hash join the product dimension. BAD_PRODUCTS takes 0.58s, PRODUCTS2 takes just 0.11s.
  • The Bloom filter on SALES2 takes just 0.35s, while the filter on SALES takes 0.76s and requires fewer buffers because the PROD_ID equality join is more selective.

Summary 

  • Effective dating leads to inequality conditions between fact and dimension tables.
    • Can't express inequalities in foreign key – it has to be an equality join to primary key/unique key.
    • It is impossible to achieve foreign key join elimination on these dimensions.
    • You have to do more work evaluating the inequality conditions when you eventually hash join these dimensions into the result set. 
    • Bloom Filters only work with equality predicates. They rough match data, but there can be false positive results. The subsequent hash join does an exact match on a reduced volume of data. The Bloom cannot reduce the data on the inequality conditions, so they return more data to the hash operation that then takes longer to execute.
  • Conclusion
    • You should rigorously follow the star schema principles so you can always define a foreign key between facts and dimensions.

No comments :