Abstract
Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES.
- AAD+96 A. Agarwal, R. Agrawal, P. hi. Deshpande, A. Gupta, J. F. Naughton, F{. F{amakrishnan. and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 22nd Intl. Conf. Very Large Data Bases, Mumbai(Bombay), September 1996. Google ScholarDigital Library
- ACSW96 A. Aiken, J. Chen, M. Stonebraker, and A. Woodruff. Tioga-2: A direct manipulation database visualization environment. In Proc. of the I 2th Intl. Conf. Data Engineering, pages 208-217, New Orleans, February 1996. Google ScholarDigital Library
- AZ96 G. Antoshenkov and M Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229-237. 1996. Google ScholarDigital Library
- Bil86 P. Billingsley. Probability and Measure. Wiley, New York, second edition, 1986.Google Scholar
- BM96 R. J. Bayardo, Jr. and D. P. Miranker. Processing queries for first-few answers. In Fifth Intl. Con}. Information and Knowledge Management, pages 45-52, Rockville, Maryland, 1996. Google ScholarDigital Library
- Bra84 K. Bratbergsengen. Hashing methods and relational algebra operations. In Proc. I Oth Intl. Conf. Very Large Data Bases, pages 323-333, Singapore, August 1984. Google ScholarDigital Library
- CCS93 E. F. Codd, S. B. Codd, and C. T. Sal- Icy. Providing OLAP (on-line analyticalprocessing) to user-analysts: an IT mandate.URL http://www.arborsoft.com/papers/coddTOC.htmI., I993.Google Scholar
- CGL83 T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing the sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.Google ScholarCross Ref
- CS96 S. Chaudhuri and K. Shim. Optimizing queries with aggregate views. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database Technology- EDBT'96 5th Intl. Conj. on Extending Database Technology, volume 1057 of Lecture Notes in Computer Science, pages 167-182. Springer-Verlag, New York, 1996. Google ScholarDigital Library
- DKO+84 D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. In Proc. ACM-SIGMOD Intl. Conf. Management of Data, pages 1-8, Boston, June 1984. Google ScholarDigital Library
- GBLP96 J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing Group-By, Cross-Tab, and Sub-Totals. In Proc. of the I2th Intl. Conf. Data Engineering, pages 152-159, 1996. Google ScholarDigital Library
- GHQ95 A. Gupta, V. Harinarayan, and D. Quass. Aggregatequery processing in data warehousing environments. In Proc. 21st Intl. Con}. Very Large Data Bases, Zurich, September 1995, pages 358-369. Google ScholarDigital Library
- Haa96a P. J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. IBM Research Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.Google Scholar
- Haa96b P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. IBM Research Report RJ 10050, IBM Almaden Research Center, San Jose, CA, 1996.Google Scholar
- HN96 J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. ACM- SIGMOD Intl. Con}. Management of Data, Montreal, June 1996, pages 423-424. Google ScholarDigital Library
- HNSS96 P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on random sampling. J. Comput. System Sci., 52:550-569, 1996. Google ScholarDigital Library
- HOD91 W.-C. Hou, G. ()zsoyoglu, and E. Dogdu. Errorconstrained count query evaluation in relational databases. In Proceedir~gs, 1991 ACM-SIGMOD Intl. CoT~.f. Managrnent of Data, pages 278-287. ACM Press, 1991. Google Scholar
- Hoe63 W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13- 30, 1963.Google ScholarCross Ref
- HOT88 W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Statistical estimators for relational algebra expressions. In Proc. 7th .4CM SIGACT-SIGMOD-SIGART Symposium on Principles o/Database Systems, pages 276-287, Austin, March 1988. Google ScholarDigital Library
- HOT89 W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. A CM-SIGMOD Intl. Con}. Management of Data, pages 68-77, Portland, May-June 1989. Google ScholarDigital Library
- LNSS93 R. J. Lipton, J. F. Naughton, D. A. Schneider, and S. Seshadri. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195- 226, 1993. Google ScholarDigital Library
- Mye85 B. A. Myers. The importance of percent-done progress indicators for computer-human interfaces. In Proc. SIG CHI '85: Human Factors in Computing Systems, pages 11-17, April 1985. Google ScholarDigital Library
- Olk93 F. Olken. Random Sampling from Databases. PhD thesis, University of California, Berkeley, 1993.Google Scholar
- Pos95 Postgres95 home page,1995. UFI.L http://www, ki.net / postgres95.Google Scholar
- SAC+79 P. G. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. In Proc. A CM-SIGMOD Intl. Conf. Management o.f Data, pages 22-34, Boston, June 1979. Google ScholarDigital Library
- SHP+96 P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. C. Leung, It. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In Proc. A CM-SIGMOD intl. Conf. Management o/Data, Montreal, June 1996, pages 435-446. Google ScholarDigital Library
- SPL96 P. Seshadri, H. Pirahesh, and T. C. Leung. Complex query decorrelation. In Proc. I2th IEEE Intl. Conf. Data Engineering, New Orleans, February 1996. Google ScholarDigital Library
- VU92 M. Vetterli and K. M. Uz. Multiresolution coding techniques for digital video: A review. Multidimensional Systems and Signal Processing, 3:161-187, 1992. Google ScholarDigital Library
- VL93 S. V. Vrbsky and J. W. S. Liu. APPROXIMATE m A query processor that produces monotonically improving approximate answers. IEEE Transactions on Knowledge and Data Engineering, 5(6):1056-1068, 1993. Google ScholarDigital Library
- WA91 A. Wilschut and P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Information Systems, pages 68-77, Dec 1991. Google ScholarDigital Library
- YL95 W. P. Yan and P.-A. Larson. Eager aggregation and lazy aggregation. In Proc. 21st Intl. Conf. Very Large Data Bases, Zurich, September 1995, pages 345-357. Google ScholarDigital Library
Index Terms
- Online aggregation
Recommendations
Online aggregation
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of dataAggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to ...
Ripple joins for online aggregation
We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
Ripple joins for online aggregation
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataWe present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
Comments