Compressing Data Cube in Parallel OLAP Systems

Sangster Award Recipient Bo-Yong Liang, PhD candidate, Carleton University

Data warehouses provide the primary support for Decision Support Systems (DSS) and Business Intelligence (BI) systems. One of the most interesting recent themes in this respect has been the computation and manipulation of the data cube, a relational model that can be used to support On-Line Analytical Processing (OLAP). Within the context of massive data volumes, data cube computation has to be very efficient with respect to speed and space. Many research studies have shown that parallel computation effectively speeds up data cube construction. Data cube compression, on the other hand, is not only crucial for computing and storing data cubes in limited space, but also reduces I/O access time. Though compression algorithms are quite common in the literature, most are poorly suited to database/cube environments since they (i) offer relatively poor compression ratios or (ii) result in significant run-time penalties.

My work both proposes and implements an efficient algorithm to compress the cubes in the progress of the parallel data cube generation. This low overhead compression mechanism provides block-by-block and record-by-record compression by using tuple difference coding techniques, thereby maximizing the compression ratio and minimizing the decompression penalty at run-time.

The experimental results demonstrate that the typical compression ratio is about 30:1 without sacrificing running time. My work also proposes two data cube computation algorithms (random query, sub cube construction) based on the compressed data. Moreover, my work demonstrates that the compression method is also suitable for Hilbert Space Filling Curve, a mechanism widely used in multi-dimensional indexing.