skip to main content
10.1145/253260.253291acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Online aggregation

Published:01 June 1997Publication History

ABSTRACT

Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES.

References

  1. AAD+96.A. Agarwal, R. Agrawal, P. hi. Deshpande, A. Gupta, J. F. Naughton, F{. F{amakrishnan. and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 22nd Intl. Conf. Very Large Data Bases, Mumbai(Bombay), September 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ACSW96.A. Aiken, J. Chen, M. Stonebraker, and A. Woodruff. Tioga-2: A direct manipulation database visualization environment. In Proc. of the I 2th Intl. Conf. Data Engineering, pages 208-217, New Orleans, February 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AZ96.G. Antoshenkov and M Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229-237. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bil86.P. Billingsley. Probability and Measure. Wiley, New York, second edition, 1986.Google ScholarGoogle Scholar
  5. BM96.R. J. Bayardo, Jr. and D. P. Miranker. Processing queries for first-few answers. In Fifth Intl. Con}. Information and Knowledge Management, pages 45-52, Rockville, Maryland, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bra84.K. Bratbergsengen. Hashing methods and relational algebra operations. In Proc. I Oth Intl. Conf. Very Large Data Bases, pages 323-333, Singapore, August 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CCS93.E. F. Codd, S. B. Codd, and C. T. Sal- Icy. Providing OLAP (on-line analyticalprocessing) to user-analysts: an IT mandate.URL http://www.arborsoft.com/papers/coddTOC.htmI., I993.Google ScholarGoogle Scholar
  8. CGL83.T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing the sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  9. CS96.S. Chaudhuri and K. Shim. Optimizing queries with aggregate views. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database Technology- EDBT'96 5th Intl. Conj. on Extending Database Technology, volume 1057 of Lecture Notes in Computer Science, pages 167-182. Springer-Verlag, New York, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. DKO+84.D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. In Proc. ACM-SIGMOD Intl. Conf. Management of Data, pages 1-8, Boston, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. GBLP96.J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing Group-By, Cross-Tab, and Sub-Totals. In Proc. of the I2th Intl. Conf. Data Engineering, pages 152-159, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. GHQ95.A. Gupta, V. Harinarayan, and D. Quass. Aggregatequery processing in data warehousing environments. In Proc. 21st Intl. Con}. Very Large Data Bases, Zurich, September 1995, pages 358-369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Haa96a.P. J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. IBM Research Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.Google ScholarGoogle Scholar
  14. Haa96b.P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. IBM Research Report RJ 10050, IBM Almaden Research Center, San Jose, CA, 1996.Google ScholarGoogle Scholar
  15. HN96.J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. ACM- SIGMOD Intl. Con}. Management of Data, Montreal, June 1996, pages 423-424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. HNSS96.P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on random sampling. J. Comput. System Sci., 52:550-569, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. HOD91.W.-C. Hou, G. ()zsoyoglu, and E. Dogdu. Errorconstrained count query evaluation in relational databases. In Proceedir~gs, 1991 ACM-SIGMOD Intl. CoT~.f. Managrnent of Data, pages 278-287. ACM Press, 1991. Google ScholarGoogle Scholar
  18. Hoe63.W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13- 30, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  19. HOT88.W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Statistical estimators for relational algebra expressions. In Proc. 7th .4CM SIGACT-SIGMOD-SIGART Symposium on Principles o/Database Systems, pages 276-287, Austin, March 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. HOT89.W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. A CM-SIGMOD Intl. Con}. Management of Data, pages 68-77, Portland, May-June 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. LNSS93.R. J. Lipton, J. F. Naughton, D. A. Schneider, and S. Seshadri. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195- 226, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mye85.B. A. Myers. The importance of percent-done progress indicators for computer-human interfaces. In Proc. SIG CHI '85: Human Factors in Computing Systems, pages 11-17, April 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Olk93.F. Olken. Random Sampling from Databases. PhD thesis, University of California, Berkeley, 1993.Google ScholarGoogle Scholar
  24. Pos95.Postgres95 home page,1995. UFI.L http://www, ki.net / postgres95.Google ScholarGoogle Scholar
  25. SAC+79.P. G. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. In Proc. A CM-SIGMOD Intl. Conf. Management o.f Data, pages 22-34, Boston, June 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. SHP+96.P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. C. Leung, It. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In Proc. A CM-SIGMOD intl. Conf. Management o/Data, Montreal, June 1996, pages 435-446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. SPL96.P. Seshadri, H. Pirahesh, and T. C. Leung. Complex query decorrelation. In Proc. I2th IEEE Intl. Conf. Data Engineering, New Orleans, February 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. VU92.M. Vetterli and K. M. Uz. Multiresolution coding techniques for digital video: A review. Multidimensional Systems and Signal Processing, 3:161-187, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. VL93.S. V. Vrbsky and J. W. S. Liu. APPROXIMATE m A query processor that produces monotonically improving approximate answers. IEEE Transactions on Knowledge and Data Engineering, 5(6):1056-1068, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. WA91.A. Wilschut and P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Information Systems, pages 68-77, Dec 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. YL95.W. P. Yan and P.-A. Larson. Eager aggregation and lazy aggregation. In Proc. 21st Intl. Conf. Very Large Data Bases, Zurich, September 1995, pages 345-357. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online aggregation

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
              June 1997
              594 pages
              ISBN:0897919114
              DOI:10.1145/253260

              Copyright © 1997 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 1997

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              SIGMOD '97 Paper Acceptance Rate42of202submissions,21%Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader