When you query a star schema, you essentially have two choices;
- bitmap index and star transformation
- full scan, bloom filter, and hash join
Star Transformation
Star transformation was introduced in Oracle 8i (see also Oracle Optimizer Blog: Optimizer Transformations: Star Transformation). A star transformation requires:- A bitmap index on each foreign key column(s) on the FACT table
- Though you don't actually need the foreign key to be defined
- An index on primary key on the DIMENSION table
- Set STAR_TRANSFORMATION_ENABLED=TRUE to enable the transformation.
- Or use the /*+STAR_TRANSFORMATION*/ hint.
If you were going to try to do this with conventional b-tree indexes, you would need an index for each permutation of foreign key combinations.
For example, this query reports US sales in 1999 by customer state:
The execution plan below shows:
Bloom Filters were originally introduced into Oracle in 10gR2, but there was no documentation. See also:
The exact balance point between the two will depend on many factors, the point here is the trend.
That is WHY bitmaps are not recommended - if you update a bitmap indexed column OR you insert/delete from the table (which will modify the bitmap index), you'll find serialization at best and deadlocks all over the place at worst" (Tom Kyte, 2007).
You may also see the size of the bitmap index grow significantly and quickly.
Jonathan Lewis has written extensively on the subject of Bitmap Indexes on his blog:
So bitmap indexes were really only ever an option where you bulk load the data, and then the data is static while you query it. You might even drop the index prior to loading data and then rebuild it again afterward. They are fast to build because there is no sort involved.
In choosing whether to still use bitmaps indexes instead of Bloom filters, you have to decide whether the improved performance for a probably small number of queries that return a small volume of data is worth the challenges associated with bitmap indexes. In general, I think it probably isn't.
For example, this query reports US sales in 1999 by customer state:
SELECT c.country_name
, u.cust_state_province
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
from sales s
, customers u
, products p
, times t
, countries c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND u.cust_id = s.cust_id
AND u.country_id = c.country_id
AND c.country_iso_code = 'US'
AND t.fiscal_year = 1999
GROUP BY c.country_name
, u.cust_state_province
ORDER BY 1,2
/
The execution plan below shows:
- Line 2: A temporary table that is the combination of countries and customers is created.
- Line 18: Bitmap look-up of TIMES dimensions on the SALES_TIME_BIX bitmap index.
- Line 23: Bitmap look-up of the temporary table against the SALES_CUST_BIX bitmap index
- Line 13: A bitmap AND is performed of the two bitmap results
- Line 8: The temporary table is used again to join in the customer and state description.
Plan hash value: 3988189767
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2133 (100)| | | | 45 |00:00:02.74 | 98078 | 195 | | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | 45 |00:00:02.74 | 98078 | 195 | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D77CE_A4BC21 | 1 | | | | | | | 0 |00:00:00.08 | 1525 | 0 | 1024 | 1024 | |
| 3 | HASH JOIN | | 1 | 2921 | 111K| 418 (1)| 00:00:01 | | | 18520 |00:00:00.05 | 1524 | 0 | 1185K| 1185K| 599K (0)|
| 4 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | 0 | | | |
| 5 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 1138K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | 0 | | | |
| 6 | SORT GROUP BY | | 1 | 2359 | 101K| 1715 (1)| 00:00:01 | | | 45 |00:00:02.66 | 96552 | 195 | 6144 | 6144 | 6144 (0)|
| 7 | HASH JOIN | | 1 | 12057 | 518K| 1713 (1)| 00:00:01 | | | 141K|00:00:02.49 | 96552 | 195 | 2391K| 1595K| 2015K (0)|
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D77CE_A4BC21 | 1 | 2921 | 75946 | 6 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | 0 | | | |
| 9 | VIEW | VW_ST_B6F3B15C | 1 | 12057 | 211K| 1707 (1)| 00:00:01 | | | 141K|00:00:02.18 | 96552 | 195 | | | |
| 10 | NESTED LOOPS | | 1 | 12057 | 553K| 1684 (1)| 00:00:01 | | | 141K|00:00:02.08 | 96552 | 195 | | | |
| 11 | PARTITION RANGE SUBQUERY | | 1 | 12056 | 294K| 322 (1)| 00:00:01 |KEY(SQ)|KEY(SQ)| 141K|00:00:01.05 | 96102 | 0 | | | |
| 12 | BITMAP CONVERSION TO ROWIDS | | 5 | 12056 | 294K| 322 (1)| 00:00:01 | | | 141K|00:00:00.94 | 96047 | 0 | | | |
| 13 | BITMAP AND | | 5 | | | | | | | 5 |00:00:00.87 | 96047 | 0 | | | |
| 14 | BITMAP MERGE | | 5 | | | | | | | 5 |00:00:00.02 | 1914 | 0 | 1024K| 512K|40960 (0)|
| 15 | BITMAP KEY ITERATION | | 5 | | | | | | | 364 |00:00:00.02 | 1914 | 0 | | | |
| 16 | BUFFER SORT | | 5 | | | | | | | 1820 |00:00:00.01 | 55 | 0 | 73728 | 73728 | |
| 17 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
| 18 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | 1820 | | | | |KEY(SQ)|KEY(SQ)| 364 |00:00:00.01 | 1859 | 0 | | | |
| 19 | BITMAP MERGE | | 5 | | | | | | | 5 |00:00:00.84 | 94133 | 0 | 8624K| 1078K| 319K (0)|
| 20 | BITMAP KEY ITERATION | | 5 | | | | | | | 6566 |00:00:00.81 | 94133 | 0 | | | |
| 21 | BUFFER SORT | | 5 | | | | | | | 92600 |00:00:00.14 | 0 | 0 | 28M| 1930K| 928K (0)|
| 22 | TABLE ACCESS FULL | SYS_TEMP_0FD9D77CE_A4BC21 | 1 | 2921 | 14605 | 6 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | 0 | | | |
| 23 | BITMAP INDEX RANGE SCAN | SALES_CUST_BIX | 92600 | | | | |KEY(SQ)|KEY(SQ)| 6566 |00:00:00.54 | 94133 | 0 | | | |
| 24 | TABLE ACCESS BY USER ROWID | SALES | 141K| 1 | 22 | 1385 (1)| 00:00:01 | ROWID | ROWID | 141K|00:00:00.76 | 450 | 195 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Note
-----
- cbqt star transformation used for this statement
- rely constraint used for this statement
The temporary table optimisation seen above for the join back to the dimension table was introduced in 10g. It can be suppressed by setting:- STAR_TRANSFORMATION_ENABLED=TEMP_DISABLE
Plan hash value: 2782741738
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2950 (100)| | | | 45 |00:00:02.91 | 99599 | | | |
| 1 | SORT GROUP BY | | 1 | 45 | 2565 | 2950 (1)| 00:00:01 | | | 45 |00:00:02.91 | 99599 | 6144 | 6144 | 6144 (0)|
|* 2 | HASH JOIN | | 1 | 635 | 36195 | 2949 (1)| 00:00:01 | | | 141K|00:00:02.77 | 99599 | 12M| 2994K| 16M (0)|
| 3 | MERGE JOIN CARTESIAN | | 1 | 12057 | 423K| 2533 (1)| 00:00:01 | | | 141K|00:00:02.39 | 98078 | | | |
|* 4 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
| 5 | BUFFER SORT | | 1 | 12057 | 211K| 2531 (1)| 00:00:01 | | | 141K|00:00:02.29 | 98076 | 6219K| 1010K| 5527K (0)|
| 6 | VIEW | VW_ST_958ED0D5 | 1 | 12057 | 211K| 2531 (1)| 00:00:01 | | | 141K|00:00:02.03 | 98076 | | | |
| 7 | NESTED LOOPS | | 1 | 12057 | 553K| 2097 (1)| 00:00:01 | | | 141K|00:00:01.92 | 98076 | | | |
| 8 | PARTITION RANGE SUBQUERY | | 1 | 12056 | 294K| 734 (1)| 00:00:01 |KEY(SQ)|KEY(SQ)| 141K|00:00:01.16 | 97626 | | | |
| 9 | BITMAP CONVERSION TO ROWIDS| | 5 | 12056 | 294K| 734 (1)| 00:00:01 | | | 141K|00:00:01.05 | 97571 | | | |
| 10 | BITMAP AND | | 5 | | | | | | | 5 |00:00:00.98 | 97571 | | | |
| 11 | BITMAP MERGE | | 5 | | | | | | | 5 |00:00:00.02 | 1914 | 1024K| 512K|40960 (0)|
| 12 | BITMAP KEY ITERATION | | 5 | | | | | | | 364 |00:00:00.02 | 1914 | | | |
| 13 | BUFFER SORT | | 5 | | | | | | | 1820 |00:00:00.01 | 55 | 73728 | 73728 | |
|* 14 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | | | |
|* 15 | BITMAP INDEX RANGE SCAN| SALES_TIME_BIX | 1820 | | | | |KEY(SQ)|KEY(SQ)| 364 |00:00:00.01 | 1859 | | | |
| 16 | BITMAP MERGE | | 5 | | | | | | | 5 |00:00:00.96 | 95657 | 8624K| 1078K| 319K (0)|
| 17 | BITMAP KEY ITERATION | | 5 | | | | | | | 6566 |00:00:00.92 | 95657 | | | |
| 18 | BUFFER SORT | | 5 | | | | | | | 92600 |00:00:00.21 | 1524 | 36M| 2148K| 1180K (0)|
|* 19 | HASH JOIN | | 1 | 2921 | 52578 | 418 (1)| 00:00:01 | | | 18520 |00:00:00.04 | 1524 | 1922K| 1922K| 565K (0)|
|* 20 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 8 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
| 21 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 541K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | | | |
|* 22 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX | 92600 | | | | |KEY(SQ)|KEY(SQ)| 6566 |00:00:00.58 | 94133 | | | |
| 23 | TABLE ACCESS BY USER ROWID | SALES | 141K| 1 | 22 | 1797 (1)| 00:00:01 | ROWID | ROWID | 141K|00:00:00.48 | 450 | | | |
| 24 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 1138K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.03 | 1521 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Without Star Transformation: Full Scan-Bloom Filter
A Bloom filter is a form of bitmap. Each dimension value is hashed and maps to a bit in the map. The fact table is pushed through the same hash, and if the bit in the map was set by hashing the dimensions, then it is a possible match. It is only a possible match because multiple values might hash to the same value. So, you can get false positives that Oracle then filters out by exactly joining them with a hash-join. However, this hash is much smaller than if the Bloom filter had not been used.Bloom Filters were originally introduced into Oracle in 10gR2, but there was no documentation. See also:
- "What is a Bloom filter?" by Christian Antognini
- "Bloom Filters" by Christian Antognini, DOAG 2016
- Interactive Java Demo by Jason Davis:
Plan hash value: 2588754131
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1586 (100)| | | | 45 |00:00:00.46 | 2063 | 472 | | | |
| 1 | SORT GROUP BY | | 1 | 45 | 3105 | 1586 (3)| 00:00:01 | | | 45 |00:00:00.46 | 2063 | 472 | 6144 | 6144 | 6144 (0)|
|* 2 | HASH JOIN | | 1 | 12057 | 812K| 1585 (3)| 00:00:01 | | | 141K|00:00:00.31 | 2063 | 472 | 1995K| 1995K| 1655K (0)|
| 3 | PART JOIN FILTER CREATE | :BF0000 | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
|* 4 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
|* 5 | HASH JOIN | | 1 | 48360 | 2691K| 1568 (3)| 00:00:01 | | | 169K|00:00:00.27 | 2007 | 472 | 2391K| 1595K| 1990K (0)|
|* 6 | HASH JOIN | | 1 | 2921 | 111K| 418 (1)| 00:00:01 | | | 18520 |00:00:00.03 | 1524 | 0 | 1236K| 1236K| 728K (0)|
|* 7 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | 0 | | | |
| 8 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 1138K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | 0 | | | |
| 9 | PARTITION RANGE JOIN-FILTER| | 1 | 918K| 15M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.15 | 482 | 472 | | | |
| 10 | TABLE ACCESS FULL | SALES | 5 | 918K| 15M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.15 | 482 | 472 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Which is better?
It depends. Often Bloom filter performance is better, but on queries that return very few values, you might find the star transformation outperforms the Bloom filter. The Oracle documentation says that star transformation is a cost-based optimisation. It is only used if the plan is cheaper. Although I have found some cases where it star transforms to a more expensive plan.- "The optimizer performs the star transformation by identifying the fact and constraint dimension tables automatically. The optimizer performs the star transformation only if the cost of the transformed plan is lower than the alternatives. Also, the optimizer attempts temporary table transformation automatically whenever materialization improves performance"
- "The STAR_TRANSFORMATION hint instructs the optimizer to use the best plan in which the transformation has been used. Without the hint, the optimizer could make a query optimization decision to use the best plan generated without the transformation, instead of the best plan for the transformed query"
select c.country_name
, u.cust_state_province
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
from sales s
, customers u
, products p
, times t
, countries c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND u.cust_id = s.cust_id
AND u.country_id = c.country_id
AND c.country_iso_code = '&&iso_country_code'
AND p.prod_category_id = 205
and t.fiscal_year = 1999
GROUP BY c.country_name
, u.cust_state_province
ORDER BY 1,2
/
The data by country is highly skewed. That will let me compare star transformation -v- Bloom filter for different data volumes. As the number of sales decreases the elapsed time of the star transformation falls below that of the Bloom filter. The faster query is shown in bold.
ISO Country Code
|
Country Name
|
Number of Sales in 1999
|
Number of Sales
|
Star Transformation
|
Full Scan-Bloom Filter
|
||
A-Time
|
Buffers
|
A-Time
|
Buffers
|
||||
US
|
United States of America
|
2662
|
526212
|
1.76
|
97999
|
0.55
|
2068
|
DE
|
Germany
|
561
|
81978
|
0.51
|
45572
|
0.29
|
2068
|
JP
|
Japan
|
281
|
60183
|
0.16
|
7118
|
0.23
|
2068
|
GB
|
United Kingdom
|
391
|
58638
|
0.38
|
42339
|
0.25
|
2068
|
IT
|
Italy
|
257
|
42570
|
0.53
|
43526
|
0.38
|
2068
|
AU
|
Australia
|
228
|
33685
|
0.14
|
8113
|
0.26
|
2068
|
FR
|
France
|
161
|
33078
|
0.3
|
23401
|
0.27
|
2068
|
SG
|
Singapore
|
80
|
25253
|
0.16
|
6928
|
0.4
|
2068
|
CA
|
Canada
|
90
|
22858
|
0.27
|
14118
|
0.32
|
2068
|
ES
|
Spain
|
85
|
17136
|
0.16
|
14282
|
0.22
|
2068
|
DK
|
Denmark
|
89
|
16651
|
0.08
|
5844
|
0.24
|
2068
|
AR
|
Argentina
|
3
|
202
|
0.07
|
5724
|
0.21
|
2068
|
BR
|
Brazil
|
9
|
180
|
0.09
|
7912
|
0.26
|
2068
|
TR
|
Turkey
|
1
|
168
|
0.05
|
4127
|
0.19
|
2068
|
CN
|
China
|
4
|
19
|
0.1
|
7265
|
0.18
|
2068
|
PL
|
Poland
|
2
|
18
|
0.11
|
7267
|
0.23
|
2068
|
SA
|
Saudi Arabia
|
0
|
7
|
0.07
|
4037
|
0.26
|
2068
|
The Trouble with Bitmap Indexes
"There is a locking implication by design in bitmap indexes, the keys point to hundreds of rows and if I lock a single key entry in a bitmap index, I've locked the key entry for hundreds of other rows.That is WHY bitmaps are not recommended - if you update a bitmap indexed column OR you insert/delete from the table (which will modify the bitmap index), you'll find serialization at best and deadlocks all over the place at worst" (Tom Kyte, 2007).
You may also see the size of the bitmap index grow significantly and quickly.
Jonathan Lewis has written extensively on the subject of Bitmap Indexes on his blog:
So bitmap indexes were really only ever an option where you bulk load the data, and then the data is static while you query it. You might even drop the index prior to loading data and then rebuild it again afterward. They are fast to build because there is no sort involved.
In choosing whether to still use bitmaps indexes instead of Bloom filters, you have to decide whether the improved performance for a probably small number of queries that return a small volume of data is worth the challenges associated with bitmap indexes. In general, I think it probably isn't.
No comments :
Post a Comment