The /*+Go-Faster*/ Oracle Blog: Choosing the Right Compression

Choosing the right database table compression matters. It affects the size of the objects and also the performance of your application. Compression makes objects smaller, and for many systems, this is an objective in itself, but it also affects performance. As compressed objects are smaller, less physical I/O is required to access them. However, that saving is paid for in CPU. More CPU is required to compress the data when it is written, and then again (albeit usually less CPU) to decompress it when you read it. The question is whether the saving in I/O pays for the additional CPU.

There are a number of forms of compression to choose between. They have different characteristics. The resulting objects are different sizes, and they require different amounts of CPU to compress and decompress.

Basic Compression: This is not really compression, but a form of block-level de-duplication. It has been available since Oracle 9i. It is available on all platforms.
Hybrid Columnar Compression is only available on Engineered Systems, FS Flash Storage System and ZFS Storage Appliance (ZFSSA). 4 different levels of compression that use 3 different compression algorithms. HCC is frequently positioned as being highly effective at reducing the size of data warehouses, but it also has implications for performance.

Query Low: LZO algorithm
Query High: ZLIB algorithm
Archive Low: ZLIB algorithm higher level with a larger block size.
Archive High: BZIP2 algorithm. The higher compression ratio comes at the expense of significantly increased CPU overhead on decompression.

My Test System

SuperCluster M7 (i.e. an Engineered System), Oracle 12.1.0.2, 2-node RAC, running PeopleSoft Financials 9.2.
In particular, this system runs a large number of complex PeopleSoft General Ledger nVision reports that use a number of reporting views on summary ledgers. There are materialized views built on the reporting views. The summary ledgers are updated nightly. Then the materialized views are fully refreshed in non-atomic mode so the tables are truncated and repopulated in direct-path mode. Hence, they can be Hybrid Columnar Compressed without needing the Advanced Compression licence. The materialized views are not indexed. Thus they can only be full scanned, and these scans can be offloaded to the storage cells (i.e. smart scan).

The metrics presented here are extracted from test system where the summary ledger tables are updated, materialised view refreshed and then 136 General Ledger (GL) report books are run on regular a daily schedule. This approach ensures that each test is, as far as was possible, run with the same initial conditions and only known parameters are changed each time. In each test, a different compression level was set on the materialized views.

I have not investigated ARCHIVE compression.

Writing

Description	Materialized Views
Description	Size (MB)	Comp. %	Comp. Ratio	Elapsed	Total DB Time
No Compression	151,868	0%	1.0	1:01	18:13
Basic Compression	34,720	77%	4.4	1:12	22:54
Query Low	22,666	85%	6.7	1:00	18:27
Query High	14,086	91%	10.8	1:05	22:56

all times expressed in hours:minutes

While basic compression does significantly reduce table sizes (4.4x), Hybrid Columnar Compression does better (6.7x – 10.8x).
HCC Query Low only has a very small overhead while writing data. The materialized views are refreshed in parallel, and while the total DB time is higher, the elapsed time is lower!
Both basic compression and HCC Query High take about 20% longer to write the data in direct path mode.

Reading

Description	nVision DB Time
Description	On CPU+CPU Wait	cell smart table scan	Others	Total	On CPU+CPU Wait	cell smart table scan	Others
No Compression	201:26	97:12	18:15	316:54	64%	31%	6%
Basic Compression	466:13	66:03	31:07	563:24	83%	12%	6%
Query Low	241:13	12:34	44:57	298:45	81%	4%	15%
Query High	278:20	4:43	11:38	294:42	94%	2%	4%

The nVision GL reports only read the materialized views (although they do write some temporary working storage tables).

As the level of compression of the materialized views increases, the time spent on cell smart table scan (physical read) decreases, this is mainly because the objects are smaller.
The slightly surprising result is that the CPU overhead of basic compression is higher than either form of HCC, and more than twice that of using uncompressed tables. Hence it significantly increases the total database response time
Both forms of query HCC perform better than the uncompressed table. The increase in CPU time is more than compensated by the reduction in time spent on physical I/O.
The Query low test was affected by other factors external to the test (a lot more time spent in other wait events). Otherwise, it would have outperformed query high. If we look at just CPU and physical read time it is clearly the best option.

In an offloaded (i.e. smart) scan on an engineered system, decompression as well as the physical I/O is done in the storage cell. The saving in time spent on physical between query low and query high is small because the disk subsystem performs so well on the engineered system.
Whereas on a non-engineered system compression and I/O must be done by the database server, there is nowhere else. You may well find that a different level of compression is optimal.

Conclusion

Basic compression exhibits a surprisingly high CPU overhead. In my test, the performance was poorer than with uncompressed tables.
HCC Query low was generally the best option, saving in I/O more than paid for the increase in CPU.
I also analysed ASH data for the different tests broken down by materialized view. I found it was better to leave smaller materialized views uncompressed but to compress the larger ones. This is not surprising since HCC adds additional structures.

These results and conclusions are from one set of test on one particular application, on one particular 'engineered' hardware configuration using one particular data set. So you need to conduct your own comparative test and decide what is best for your system.

2 comments :

Loïc Lefèvre said...: Hello David
Vert informative thanks.
To complement your post:
- HCC is also a ailable for all Enterprise Edition editions with the Oracle Database Cloud Service
- Did you try to increase the compression ratios using Clustering attributes and Zone Maps (would require Partitioning option) and this is available since 12.1.0.2

Thanks; 20 April 2018 at 08:44
David Kurtz said...: I have examined clustering by linear order in 12.1.0.2. I ordered segments by columns frequently used in equality predicates, that have multiple distinct values, and that are not partitioning columns.
I found that clustering does improve the compression ratio of Hybridge Columnar Compressed objects, without apparently increasing the CPU overhead of decompression. There was no advantage for segments that were not compressed. However, caution is required because clustering it sorts the data. This can be a significant overhead on larger segments when the data is loaded.
In my tests, I only clustered smaller objects. I did see an improvement in query performance for only a small increase in load time. I did not test zone maps because I was not able to get them to work in conjunction with fast-refreshing materialized views in non-atomic mode.; 27 April 2018 at 17:24

The /+Go-Faster/ Oracle Blog

Tuesday, April 17, 2018

Choosing the Right Compression

My Test System

Writing

Reading

Conclusion

2 comments :

Announcements

My Book

Oracle ACE Director

BlueSky / Twitter

Blog Archive

Tags

Other Oracle Blogs

Other PeopleSoft Blogs

OakTable Network

Oracle ACE Director

©David Kurtz

The /*+Go-Faster*/ Oracle Blog

Tuesday, April 17, 2018

Choosing the Right Compression

My Test System

Writing

Reading

Conclusion

2 comments :

Announcements

My Book

Oracle ACE Director

BlueSky / Twitter

Blog Archive

Subscribe

Tags

Other Oracle Blogs

Other PeopleSoft Blogs

OakTable Network

Oracle ACE Director

©David Kurtz

The /+Go-Faster/ Oracle Blog