Implementing data cubes efficiently

Authors:
Venky Harinarayan

Stanford University

Stanford University
View Profile

,
Anand Rajaraman

Stanford University

Stanford University
View Profile

,
Jeffrey D. Ullman

Stanford University

Stanford University
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 25 Issue 2June 1996pp 205–216https://doi.org/10.1145/235968.233333

Published:01 June 1996Publication History

ACM SIGMOD Record

Abstract

Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.

References

Arb Arbor Software. Multidimensional Analysis: (',onverting Corporate Data into Strategic Information. White Paper. At ht, tp://www.arborsoft.com/ papers / multiTO ( ~,. ht mlGoogle Scholar
Che96 C. Chekllri. Personal comnmnication, 1996.Google Scholar
CS94 S. Chaudhuri and K. Shim. Including (.~roup- By in Query Optimization. In Proceedings of the Twentzeth Internatzonal Conference on Very Larg~' Databases (VLDB), pages 354-366, Santiago, Chile, 1994. Google ScholarDigital Library
Fei96 U. Feige. A threshold of In n for approximating set cover. To appear in Proceedings of the 28th A CM Symposium on the Theory of Computzng (ST()(:), 1996. Google ScholarDigital Library
GBLP95 J.Gray, A. Bosworth, A. Layman, H. Pirahesh Data Cube: A Relational Aggregation ()perator Generalizing Group-By, Cross-Tab, and Sub- Totals. Microsoft TedmicaI Report No. MSR-TR- 95-22.Google Scholar
GHQ95 A. Gupta, V. Harinarayan, and D. Qllass. Aggregate-Query Processing in Data Warehousing Environments . In Proceedings of the 2lst International VLDB Conference, pages 358-369, 1995. Google ScholarDigital Library
GHRU96 H. Gupta, V. Harinarayan, A. Rajaralnan, and J. D. Ulhnan. Index Selection for ()LAP. Sift> mitred for publication. At http://db.stanford.edu/ pub/hgupt a/1996 / CubeIndex. ps Google ScholarDigital Library
Gra93 (3. Graefe. Query Evaluation Techniques for Large Databases. In ACM Computing Survey.~ Vol. 25, No. 2, June 1993. Google ScholarDigital Library
HRU95 V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing Data Cubes Efficiently. A flfll version of Lhis paper. At http://db.stanford.edu/ pllb / harinarayan / 1995 / cub e. ps Google ScholarDigital Library
HNSS95 P. J. Haas, J. F. Naughton, S. Seshadri, L. Stokes. Sampling-Based Estimation of the N~mber of Distinct Values of an Attribute. In Proceedzngs of the 21st International VLDB Conference, pages 311-320, 1995. Google ScholarDigital Library
OG95 P. O'Neill and (3. (Iraefe. Multi-Table Joins Through Bitmapped Join Indexes. In SIC, MOD Record, pages 8-11, September 1995. Google ScholarDigital Library
Raa95 F. Raab, editor. TPC, Benchmark(tin) D (Decision Support), Proposed Revision 1.0. Transaction Processing Performance Council, San Jose, CA 95112, 4 April 1995.Google Scholar
Rad95 A. Radding. Support Decision Makers With a Data Warehouse. In Datamatzorz, March 15, 1995. Google ScholarDigital Library
STG Stanford Technology Group, Inc. Designing the Data Warehouse On Relational Databases. White Paper.Google Scholar
Xen94 J. Xenakis, editor. Multidimensional Databases, tn Application Development Strategies, April 1994.Google Scholar

Index Terms

Implementing data cubes efficiently

Recommendations

Implementing data cubes efficiently
SIGMOD '96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data

Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view ...
Read More
Efficiently Pinpointing SPARQL Query Containments
Web Engineering
Abstract
Query containment is a fundamental problem in database research, which is relevant for many tasks such as query optimisation, view maintenance and query rewriting. For example, recent SPARQL engines built on Big Data frameworks that precompute ...
Read More
Range queries in OLAP data cubes

A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 25, Issue 2
June 1996
557 pages
ISSN:0163-5808
DOI:10.1145/235968
Chairman:
T. H. Merrett
McGill Univ.
,
Editors:
H. V. Jagadish,
Inderpal Singh Mumick
Issue’s Table of Contents
SIGMOD '96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data
June 1996
560 pages
ISBN:0897917944
DOI:10.1145/233269
Editor:
Jennifer Widom
Stanford Univ., Stanford, CT
Copyright © 1996 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1996
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,092
  Total Citations
  View Citations
- 5,578
  Total Downloads
- Downloads (Last 12 months)497
- Downloads (Last 6 weeks)85
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Implementing data cubes efficiently

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Implementing data cubes efficiently

Efficiently Pinpointing SPARQL Query Containments

Range queries in OLAP data cubes