Hybrid Columnar Compression on Exadata enables the highest levels of data compression and provides enterprises with tremendous cost-savings and performance improvements due to reduced I/O. HCC is optimized to use both database and storage capabilities on Exadata to deliver tremendous space savings AND revolutionary performance. Average storage savings can range from 10x to 15x depending on which Hybrid Columnar Compression level is implemented – real world customer benchmarks have resulted in storage savings of up to 204x.

 

Hybrid Columnar Compression is an enabling technology for both Warehouse Compression and Archive Compression. We will discuss each of these capabilities in detail later in this paper, but first let’s explore the implementation/benefits of Hybrid Columnar Compression on Exadata – the next generation in compression technology.

 

High Availability

Traditionally, data has been organized within a database block in a ‘row’ format, where all column data for a particular row is stored sequentially within a single database block. Having data from columns with different data types stored close together limits the amount of storage savings achievable with compression technology. An alternative approach is to store data in a ‘columnar’ format, where data is organized and stored by column. Storing column data together, with the same data type and similar characteristics, dramatically increases the storage savings achieved from compression. However, storing data in this manner can negatively impact database performance when application queries access more than one or two columns, perform even a modest number of updates, or insert small numbers of rows per transaction.

As many of you know by now, Exadata comes with an exclusive way to compress information, based on columns, and with a different storage unit called “Compression Unit”, that responds by the name of Hybrid Columnar Compression (HCC)

 

Conceptual illustration of a Logical Compression Unit

 

Exadata Hybrid Columnar Compression is a second generation columnar technology combining the best of row and column formats

•   Best compression – matching full columnar

•   Excellent scan time – 93% as good as full columnar

•   Good single row lookup – no full columnar “cliff”

•   Row format remains best for workloads with updates or trickle feeds

 

 

Warehouse Compression provides two levels of compression: LOW and HIGH.

•   HIGH typically provides a 10x reduction in storage

•   LOW typically provides a 6x reduction

•  Both levels have been optimized to increase scan query performance by taking
     advantage of the fewer number of blocks on disk

•  To maximize the storage savings and query performance benefits of Warehouse
     Compression, the default level is HIGH.

•   LOW should be chosen for environments where load time service levels are more
    critical than query performance

 

 

Warehouse Compression

Warehouse Compression provides significant storage savings by leveraging Hybrid Columnar Compression technology. Warehouse Compression typically provides a 10:1 (10x) compression ratio, delivering roughly five times the industry average savings. For example, enabling Warehouse Compression on an uncompressed 100 terabyte data warehouse would reduce the storage requirements to only 10 terabytes. Warehouse Compression would return 90 terabytes of storage back to the enterprise for other uses. In fact, the enterprise could use this reclaimed storage to support the growth of its data warehouse without purchasing additional storage for over 4 years, assuming the database doubled in size every two years. Clearly, storage savings of this magnitude dramatically reduce costs as enterprises can significantly delay storage purchases for many years

• 10x average storage savings

• 100 TB Database compresses to 10 TB

• Reclaim 90 TB of disk space

• Space for 9 more „100 TB‟ databases • 10x average scan improvement

• 1,000 IOPS reduced to 100 IOPS

 

 

 

 

 

Archive Compression

Archive Compression provides significant storage savings by leveraging Hybrid Columnar Compression technology. Archive Compression is optimized to maximize storage savings, typically achieving a compression ratio of 15:1 (15x). That is, an uncompressed table or partition would require 15x more storage than a table or partition using Archive Compression. In contrast to Warehouse Compression, Archive Compression is a pure storage saving technology. Tables or partitions utilizing Archive Compression will typically experience a decrease in performance - a factor of the compression algorithm being optimized for maximum storage savings. Therefore, Archive Compression is intended for tables or partitions that store data that is rarely accessed.

 

• Compression algorithm optimized for max storage savings

• Benefits any application with data retention requirements

• Best approach for ILM and data archival

• Minimum storage footprint

• No need to move data to tape or less expensive disks

• Data is always online and always accessible

• Intended for tables or partitions that store data that is rarely accessed

• Run queries against historical data (without recovering from tape)

• Update historical data

• Supports schema evolution (add/drop columns)

 

Below are some best practices and considerations when using Hybrid Columnar Compression:

 

• The best test environment for Hybrid Columnar Compression is where you can most closely duplicate the production environment– this will provide the most realistic (preand post- compression) performance comparisons.

 

• Hybrid Columnar Compression is NOT supported for use with the LONG data types and the use of UNIFORM EXTENTS is not recommended with Hybrid Columnar Compression

 

• Hybrid Columnar Compression is designed for relational data, not for unstructured data in BLOBs (or CLOBs). LOBs are best stored in the Oracle Database as SecureFiles LOBs, and if the customer has licensed the Advanced Compression Option, they can use SecureFiles Compression and Deduplication to potentially reduce the amount of storage required for LOBs

 

• Hybrid Columnar Compression is primarily intended for QUERY intensive data as well as archive/historic data. DML operations (INSERT/UPDATE) against a Hybrid Columnar Compressed table/partition can reduce the overall compression savings over time since data INSERTED/UPDATED via DML operations will not be compressed to the same ratio as data that is bulk loaded

 

 

advertisement

 

Now let’s move in our tests.

I created a table named ANAR_TEST  in 3.8 GB size.

 

SQL> select segment_name, bytes/1024/1024 as size_MB from dba_segments  where  segment_name=’ANAR_TEST';

 

SEGMENT_NAME SIZE_MB

———— ———-

ANAR_TEST 3809

 

 

SQL> select count(*) from ANAR_TEST;

COUNT(*)

———-

22919193

 

First, let’s try the BASIC compression method. To do this, use the COMPRESS keyword.

 

SQL> create table ANAR_TEST_basic nologging compress as select * from ANAR_TEST where 1=2;

Table created.

SQL> select table_name,logging,compression,compress_for from dba_tables

2 where table_name=’ANAR_TEST_BASIC';

 

TABLE_NAME LOGGING COMPRESS COMPRESS_FOR

—————————— ———— ————– ————

ANAR_TEST_BASIC NO ENABLED BASIC

 

SQL> set timing on

SQL> insert /*+ append */ into ANAR_TEST_basic select * from ANAR_TEST;

22919193 rows created.

Elapsed: 00:01:32.28

SQL> commit;

Commit complete.

Elapsed: 00:00:00.02

 

let’s use COMPRESS FOR OLTP keyword to use for OLTP compression.

 

SQL> create table ANAR_TEST_oltp compress for oltp as select * from ANAR_TEST where 1=2;

Table created.

SQL> select table_name,logging,compression,compress_for from dba_tables

2 where table_name=’ANAR_TEST_OLTP';

 

TABLE_NAME LOGGING COMPRESS COMPRESS_FOR

—————————— ———— ————- ————

ANAR_TEST_OLTP NO ENABLED OLTP

 

SQL> insert into ANAR_TEST_oltp select * from ANAR_TEST;

22919193 rows created.

Elapsed: 00:04:27.09

SQL> commit;

Commit complete.

Elapsed: 00:00:00.00

 

Now let’s move on Exadata compression methods. Let’s try COMPRESS FOR QUERY LOW keyword.

 

SQL> create table ANAR_TEST_query_low nologging compress for query low as select * from ANAR_TEST where 1=2;

Table created.

 

SQL> select table_name,logging,compression,compress_for from dba_tables

2 where table_name=’ANAR_TEST_QUERY_LOW';

TABLE_NAME LOGGING COMPRESS COMPRESS_FOR

—————————— ——- ——– ————

ANAR_TEST_QUERY_LOW NO ENABLED QUERY LOW

 

SQL> insert /*+ append */ into ANAR_TEST_query_low select * from ANAR_TEST;

22919193 rows created.

Elapsed: 00:01:04.92

SQL> commit;

Commit complete.

Elapsed: 00:00:00.01

 

If we want to save more space, we need to use COMPRESS FOR QUERY HIGH method. But here’s the insert time will be longer.

SQL> create table ANAR_TEST_query_high nologging compress for query high as select * from ANAR_TEST where 1=2;

Table created.

SQL> select table_name,logging,compression,compress_for from dba_tables

2 where table_name=’ANAR_TEST_QUERY_HIGH';

 

TABLE_NAME LOGGING COMPRESS COMPRESS_FOR

—————————— ———– ————— ————

ANAR_TEST_QUERY_HIGH NO ENABLED QUERY HIGH

 

SQL> insert /*+ append */ into ANAR_TEST_query_high select * from ANAR_TEST;

22919193 rows created.

Elapsed: 00:02:16.49

SQL> commit;

Commit complete.

Elapsed: 00:00:00.01

 

If  we have archive data then we can use compress for archive method to save more space and less insert time.

 

SQL> create table ANAR_TEST_archive_low nologging compress for archive low as select * from ANAR_TEST where 1=2;

Table created.

SQL> select table_name,logging,compression,compress_for from dba_tables

2 where table_name=’ANAR_TEST_ARCHIVE_LOW';

 

TABLE_NAME LOGGING COMPRESS COMPRESS_FOR

———————————— ———– ————– ————

ANAR_TEST_ARCHIVE_LOW NO ENABLED ARCHIVE LOW

 

SQL> insert /*+ append */ into ANAR_TEST_archive_low select * from ANAR_TEST;

22919193 rows created.

Elapsed: 00:03:05.70

SQL> commit;

Commit complete.

Elapsed: 00:00:00.01

 

If the time is not so important and we want to save more space then we use the keyword HIGH COMPRESS FOR ARCHIVE.

 

SQL> create table ANAR_TEST_archive_high nologging compress for archive high as select * from ANAR_TEST where 1=2;

Table created.

 

SQL> select table_name,logging,compression,compress_for from dba_tables

2 where table_name=’ANAR_TEST_ARCHIVE_HIGH';

 

TABLE_NAME LOGGING COMPRESS COMPRESS_FOR

————————————— ———– ————— ————

ANAR_TEST_ARCHIVE_HIGH NO ENABLED ARCHIVE HIGH

 

SQL> insert /*+ append */ into ANAR_TEST_archive_high select * from ANAR_TEST;

22919193 rows created.

Elapsed: 00:12:02.97

SQL> commit;

Commit complete.

Elapsed: 00:00:00.01

 

Finally, let’s see table sizes.

 

SQL> set line 1000

SQL> select segment_name,bytes/1024/1024 as size_MB from dba_segments

2 where segment_name like ‘ANAR_TEST%’ order by 1;

 

SEGMENT_NAME             SIZE_MB

ANAR_TEST                3809

ANAR_TEST_ARCHIVE_HIGH   424

ANAR_TEST_ARCHIVE_LOW    488

ANAR_TEST_BASIC          2500

ANAR_TEST_OLTP           2997

ANAR_TEST_QUERY_HIGH     512

ANAR_TEST_QUERY_LOW      856

7 rows selected.

 

Let me add the compression times and our table looks as follows.

 

SEGMENT_NAME             SIZE_MB    TIME

ANAR_TEST                3809

ANAR_TEST_ARCHIVE_HIGH   424        00:12:02.97

ANAR_TEST_ARCHIVE_LOW    488        00:03:05.70

ANAR_TEST_BASIC          2500       00:01:32.28

ANAR_TEST_OLTP           2997       00:04:27.09

ANAR_TEST_QUERY_HIGH     512        00:02:16.49

ANAR_TEST_QUERY_LOW      856        00:01:04.92

 

If such a question: How To Disable Exadata Hybrid Columnar Compression ☺

Now let’s move in our tests.

 

1) Create the partitioned tables and define each partition with the desired compression option:

oracle@bakuexa1dbadm01:/$ sqlplus "/ as sysdba";

SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 14 10:05:00 2015

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

SQL> create table orders (cid numeric(10))

partition by range (cid)

(partition p1 values less than (100000) nocompress,

partition p2 values less than (200000) compress for archive low,

partition p3 values less than (300000) compress for query high,

partition p4 values less than (maxvalue) compress for query low)

enable row movement;

Table created.

 

2) Display the table structure to verify the compression option associated with each partition:

 

SQL> select PARTITION_NAME,COMPRESSION, COMPRESS_FOR

from DBA_tab_partitions

where TABLE_NAME like 'ORDERS';

2 3

PARTITION_NAME COMPRESS COMPRESS_FOR

------------------------------ -------- ------------

P1 DISABLED

P2 ENABLED ARCHIVE LOW

P3 ENABLED QUERY HIGH

P4 ENABLED QUERY LOW

 

3) Now disable the HCC compression on the desired partition:

 

SQL> alter table ORDERS modify partition P2 nocompress;

 

Table altered.

 

4) Verify the HCC compression was disabled:

SQL> select PARTITION_NAME,COMPRESSION, COMPRESS_FOR

from DBA_tab_partitions

where TABLE_NAME like 'ORDERS'; 2 3

 

PARTITION_NAME COMPRESS COMPRESS_FOR

------------------------------ -------- ------------

P1 DISABLED

P2 DISABLED <(==

P3 ENABLED QUERY HIGH

P4 ENABLED QUERY LOW

 

Conclusion

HCC delivers  better Compression Ratios than the Block Compression methods. Load Time increases from QUERY LOW (best) over QUERY HIGH and ARCHIVE LOW (both moderate) to ARCHIVE HIGH (longest Load Time). Query Performance decreases similar from QUERY LOW (best) to ARCHIVE HIGH (longest Query Time). All HCC methods except ARCHIVE HIGH delivered better Query Performance than uncompressed, though. In short: HCC is most likely able to save large amounts of space in your Data Warehouse without decreasing Query Performance much if at all. It will probably compress your Archival Data significantly.

 

otech magazine - oracle open world 2015  copyright otech magazine 2015

terms and conditions

Exadata Hybrid Columnar Compression Testing and Compression Methods

on Exadata X4 1