There are a number of forms of compression to choose between. They have different characteristics. The resulting objects are different sizes, and they require different amounts of CPU to compress and decompress.
- Basic Compression: This is not really compression, but a form of block-level de-duplication. It has been available since Oracle 9i. It is available on all platforms.
- Hybrid Columnar Compression is only available on Engineered Systems, FS Flash Storage System and ZFS Storage Appliance (ZFSSA). 4 different levels of compression that use 3 different compression algorithms. HCC is frequently positioned as being highly effective at reducing the size of data warehouses, but it also has implications for performance.
- Query Low: LZO algorithm
- Query High: ZLIB algorithm
- Archive Low: ZLIB algorithm higher level with a larger block size.
- Archive High: BZIP2 algorithm. The higher compression ratio comes at the expense of significantly increased CPU overhead on decompression.
Further Reading:
- If you are not familiar with the various forms of compression, and their licencing requirements, then Franck Pachot’s blog Oracle compression, availability and licencing will set the scene.
- As early as 2016, Roger MacNichol pointed out in Compression in a well-balanced system that “since HCC was released …, DBAs have almost always used Query High” but as CPU speeds have increased “the time has come for a reassessment of role HCC Query Low”. This blog relates such a case in point.
- Oracle White Paper: Hybrid Columnar Compression (HCC) on Oracle Database 18c
My Test System
- SuperCluster M7 (i.e. an Engineered System), Oracle 12.1.0.2, 2-node RAC, running PeopleSoft Financials 9.2.
- In particular, this system runs a large number of complex PeopleSoft General Ledger nVision reports that use a number of reporting views on summary ledgers. There are materialized views built on the reporting views. The summary ledgers are updated nightly. Then the materialized views are fully refreshed in non-atomic mode so the tables are truncated and repopulated in direct-path mode. Hence, they can be Hybrid Columnar Compressed without needing the Advanced Compression licence. The materialized views are not indexed. Thus they can only be full scanned, and these scans can be offloaded to the storage cells (i.e. smart scan).
- I have not investigated ARCHIVE compression.
Writing
Description | Materialized Views | ||||
Size (MB) | Comp. % | Comp. Ratio | Elapsed | Total DB Time | |
No Compression | 151,868 | 0% | 1.0 | 1:01 | 18:13 |
Basic Compression | 34,720 | 77% | 4.4 | 1:12 | 22:54 |
Query Low | 22,666 | 85% | 6.7 | 1:00 | 18:27 |
Query High | 14,086 | 91% | 10.8 | 1:05 | 22:56 |
- While basic compression does significantly reduce table sizes (4.4x), Hybrid Columnar Compression does better (6.7x – 10.8x).
- HCC Query Low only has a very small overhead while writing data. The materialized views are refreshed in parallel, and while the total DB time is higher, the elapsed time is lower!
- Both basic compression and HCC Query High take about 20% longer to write the data in direct path mode.
Reading
Description | nVision DB Time | ||||||
On CPU+CPU Wait | cell smart table scan | Others | Total | On CPU+CPU Wait | cell smart table scan | Others | |
No Compression | 201:26 | 97:12 | 18:15 | 316:54 | 64% | 31% | 6% |
Basic Compression | 466:13 | 66:03 | 31:07 | 563:24 | 83% | 12% | 6% |
Query Low | 241:13 | 12:34 | 44:57 | 298:45 | 81% | 4% | 15% |
Query High | 278:20 | 4:43 | 11:38 | 294:42 | 94% | 2% | 4% |
- As the level of compression of the materialized views increases, the time spent on cell smart table scan (physical read) decreases, this is mainly because the objects are smaller.
- The slightly surprising result is that the CPU overhead of basic compression is higher than either form of HCC, and more than twice that of using uncompressed tables. Hence it significantly increases the total database response time
- Both forms of query HCC perform better than the uncompressed table. The increase in CPU time is more than compensated by the reduction in time spent on physical I/O.
- The Query low test was affected by other factors external to the test (a lot more time spent in other wait events). Otherwise, it would have outperformed query high. If we look at just CPU and physical read time it is clearly the best option.
In an offloaded (i.e. smart) scan on an engineered system, decompression as well as the physical I/O is done in the storage cell. The saving in time spent on physical between query low and query high is small because the disk subsystem performs so well on the engineered system.
Whereas on a non-engineered system compression and I/O must be done by the database server, there is nowhere else. You may well find that a different level of compression is optimal.
Whereas on a non-engineered system compression and I/O must be done by the database server, there is nowhere else. You may well find that a different level of compression is optimal.
Conclusion
- Basic compression exhibits a surprisingly high CPU overhead. In my test, the performance was poorer than with uncompressed tables.
- HCC Query low was generally the best option, saving in I/O more than paid for the increase in CPU.
- I also analysed ASH data for the different tests broken down by materialized view. I found it was better to leave smaller materialized views uncompressed but to compress the larger ones. This is not surprising since HCC adds additional structures.
2 comments :
Hello David
Vert informative thanks.
To complement your post:
- HCC is also a ailable for all Enterprise Edition editions with the Oracle Database Cloud Service
- Did you try to increase the compression ratios using Clustering attributes and Zone Maps (would require Partitioning option) and this is available since 12.1.0.2
Thanks
I have examined clustering by linear order in 12.1.0.2. I ordered segments by columns frequently used in equality predicates, that have multiple distinct values, and that are not partitioning columns.
I found that clustering does improve the compression ratio of Hybridge Columnar Compressed objects, without apparently increasing the CPU overhead of decompression. There was no advantage for segments that were not compressed. However, caution is required because clustering it sorts the data. This can be a significant overhead on larger segments when the data is loaded.
In my tests, I only clustered smaller objects. I did see an improvement in query performance for only a small increase in load time. I did not test zone maps because I was not able to get them to work in conjunction with fast-refreshing materialized views in non-atomic mode.
Post a Comment