Online aggregation

Authors:
Joseph M. Hellerstein

Computer Science Division, University of California, Berkeley

Computer Science Division, University of California, Berkeley
View Profile

,
Peter J. Haas

Almaden Research Center, IBM Research Division

Almaden Research Center, IBM Research Division
View Profile

,
Helen J. Wang

Computer Science Division, University of California, Berkeley

Computer Science Division, University of California, Berkeley
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 26 Issue 2June 1997pp 171–182https://doi.org/10.1145/253262.253291

Published:01 June 1997Publication History

ACM SIGMOD Record

Abstract

Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES.

References

AAD+96 A. Agarwal, R. Agrawal, P. hi. Deshpande, A. Gupta, J. F. Naughton, F{. F{amakrishnan. and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 22nd Intl. Conf. Very Large Data Bases, Mumbai(Bombay), September 1996. Google ScholarDigital Library
ACSW96 A. Aiken, J. Chen, M. Stonebraker, and A. Woodruff. Tioga-2: A direct manipulation database visualization environment. In Proc. of the I 2th Intl. Conf. Data Engineering, pages 208-217, New Orleans, February 1996. Google ScholarDigital Library
AZ96 G. Antoshenkov and M Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229-237. 1996. Google ScholarDigital Library
Bil86 P. Billingsley. Probability and Measure. Wiley, New York, second edition, 1986.Google Scholar
BM96 R. J. Bayardo, Jr. and D. P. Miranker. Processing queries for first-few answers. In Fifth Intl. Con}. Information and Knowledge Management, pages 45-52, Rockville, Maryland, 1996. Google ScholarDigital Library
Bra84 K. Bratbergsengen. Hashing methods and relational algebra operations. In Proc. I Oth Intl. Conf. Very Large Data Bases, pages 323-333, Singapore, August 1984. Google ScholarDigital Library
CCS93 E. F. Codd, S. B. Codd, and C. T. Sal- Icy. Providing OLAP (on-line analyticalprocessing) to user-analysts: an IT mandate.URL http://www.arborsoft.com/papers/coddTOC.htmI., I993.Google Scholar
CGL83 T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing the sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.Google ScholarCross Ref
CS96 S. Chaudhuri and K. Shim. Optimizing queries with aggregate views. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database Technology- EDBT'96 5th Intl. Conj. on Extending Database Technology, volume 1057 of Lecture Notes in Computer Science, pages 167-182. Springer-Verlag, New York, 1996. Google ScholarDigital Library
DKO+84 D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. In Proc. ACM-SIGMOD Intl. Conf. Management of Data, pages 1-8, Boston, June 1984. Google ScholarDigital Library
GBLP96 J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing Group-By, Cross-Tab, and Sub-Totals. In Proc. of the I2th Intl. Conf. Data Engineering, pages 152-159, 1996. Google ScholarDigital Library
GHQ95 A. Gupta, V. Harinarayan, and D. Quass. Aggregatequery processing in data warehousing environments. In Proc. 21st Intl. Con}. Very Large Data Bases, Zurich, September 1995, pages 358-369. Google ScholarDigital Library
Haa96a P. J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. IBM Research Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.Google Scholar
Haa96b P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. IBM Research Report RJ 10050, IBM Almaden Research Center, San Jose, CA, 1996.Google Scholar
HN96 J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. ACM- SIGMOD Intl. Con}. Management of Data, Montreal, June 1996, pages 423-424. Google ScholarDigital Library
HNSS96 P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on random sampling. J. Comput. System Sci., 52:550-569, 1996. Google ScholarDigital Library
HOD91 W.-C. Hou, G. ()zsoyoglu, and E. Dogdu. Errorconstrained count query evaluation in relational databases. In Proceedir~gs, 1991 ACM-SIGMOD Intl. CoT~.f. Managrnent of Data, pages 278-287. ACM Press, 1991. Google Scholar
Hoe63 W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13- 30, 1963.Google ScholarCross Ref
HOT88 W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Statistical estimators for relational algebra expressions. In Proc. 7th .4CM SIGACT-SIGMOD-SIGART Symposium on Principles o/Database Systems, pages 276-287, Austin, March 1988. Google ScholarDigital Library
HOT89 W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. A CM-SIGMOD Intl. Con}. Management of Data, pages 68-77, Portland, May-June 1989. Google ScholarDigital Library
LNSS93 R. J. Lipton, J. F. Naughton, D. A. Schneider, and S. Seshadri. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195- 226, 1993. Google ScholarDigital Library
Mye85 B. A. Myers. The importance of percent-done progress indicators for computer-human interfaces. In Proc. SIG CHI '85: Human Factors in Computing Systems, pages 11-17, April 1985. Google ScholarDigital Library
Olk93 F. Olken. Random Sampling from Databases. PhD thesis, University of California, Berkeley, 1993.Google Scholar
Pos95 Postgres95 home page,1995. UFI.L http://www, ki.net / postgres95.Google Scholar
SAC+79 P. G. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. In Proc. A CM-SIGMOD Intl. Conf. Management o.f Data, pages 22-34, Boston, June 1979. Google ScholarDigital Library
SHP+96 P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. C. Leung, It. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In Proc. A CM-SIGMOD intl. Conf. Management o/Data, Montreal, June 1996, pages 435-446. Google ScholarDigital Library
SPL96 P. Seshadri, H. Pirahesh, and T. C. Leung. Complex query decorrelation. In Proc. I2th IEEE Intl. Conf. Data Engineering, New Orleans, February 1996. Google ScholarDigital Library
VU92 M. Vetterli and K. M. Uz. Multiresolution coding techniques for digital video: A review. Multidimensional Systems and Signal Processing, 3:161-187, 1992. Google ScholarDigital Library
VL93 S. V. Vrbsky and J. W. S. Liu. APPROXIMATE m A query processor that produces monotonically improving approximate answers. IEEE Transactions on Knowledge and Data Engineering, 5(6):1056-1068, 1993. Google ScholarDigital Library
WA91 A. Wilschut and P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Information Systems, pages 68-77, Dec 1991. Google ScholarDigital Library
YL95 W. P. Yan and P.-A. Larson. Eager aggregation and lazy aggregation. In Proc. 21st Intl. Conf. Very Large Data Bases, Zurich, September 1995, pages 345-357. Google ScholarDigital Library

Index Terms

Online aggregation

Recommendations

Online aggregation
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data

Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to ...
Read More
Ripple joins for online aggregation

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
Read More
Ripple joins for online aggregation
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 26, Issue 2
June 1997
583 pages
ISSN:0163-5808
DOI:10.1145/253262
Chairman:
Sudha Ram
Univ. of Arizona, Tucson
,
Editor:
Joan M. Peckham
Univ. of Rhode Island, Kingston
Issue’s Table of Contents
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
June 1997
594 pages
ISBN:0897919114
DOI:10.1145/253260
Editors:
Joan M. Peckman
Univ. of Rhode Island, Kingston
,
Sudha Ram
Univ. of Arizona, Tucson
,
Michael Franklin
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1997
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 807
  Total Citations
  View Citations
- 4,004
  Total Downloads
- Downloads (Last 12 months)448
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Online aggregation

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Online aggregation

Ripple joins for online aggregation

Ripple joins for online aggregation