skip to main content
10.1145/375663.375694acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

A robust, optimization-based approach for approximate answering of aggregate queries

Published:01 May 2001Publication History

ABSTRACT

The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem whose goal is to minimize the error in answering queries in the given workload. A key novelty of our approach is that we can tailor the choice of samples to be robust even for workloads that are “similar” but not necessarily identical to the given workload. Finally, our techniques recognize the importance of taking into account the variance in the data distribution in a principled manner. We show how our solution can be implemented on a database system, and present results of extensive experiments on Microsoft SQL Server 2000 that demonstrate the superior quality of our method compared to previous work.

References

  1. 1.Acharya S., Gibbons P.B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries. Proc. of ACM SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Acharya S., Gibbons P.B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering. Proc. of ACM SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Barbara D., and Sullivan M. Quasi-Cubes. Exploiting Approximations in Multidimensional Databases. SIGMOD Record, Vol. 26 No. 3, Sep. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Barbara D. and Wu. X. Using Approximations to Scale Exploratory Data Analysis in Datacubes. Proceedings of the 1999 ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, San Diego, CA, Aug. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Chakrabarti K., Garofalakis M., Rastogi R., Shim K. Approximate Query Processing Using Wavelets. Proc. of VLDB 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries. Proc. of IEEE Conf. on Data Engineering, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries. Microsoft Research Technical Report MSR-TR-2001-37.Google ScholarGoogle Scholar
  8. 8.Chaudhuri S., Motwani R., Narasayya V. Random Sampling Over Joins. Proc. of ACM SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Chaudhuri S., Narasayya V. Program for TPC-D Data Generation with Skew. http://research.microsoft.com/dmx/Google ScholarGoogle Scholar
  10. 10.Cochran W.G. Sampling Techniques. John Wiley & Sons, New York, Third edition, 1977.Google ScholarGoogle Scholar
  11. 11.Ganti V., Lee M.L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering. Proc. of VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Golub G., Loan C. Matrix Computations. Johns Hopkins University Press, 1989.Google ScholarGoogle Scholar
  13. 13.Hellerstein J., Haas P., Wang H. Online Aggregation. Proc. of ACM SIGMOD,1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.HochBaum B. Approximation Algorithms for NP-Hard Problems. PWS Publishing, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Ioannidis Y., Poosala V. Histogram Based Approximations of Set- Valued Query Answers. Proc. of VLDB 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Marron J.S, Smoothing Methods for Learning from Data. Proc. of KDD 1998, Tutorial.Google ScholarGoogle Scholar
  17. 17.Poosala V., Ganti V. Fast Approximate Answers to Aggregate Queries on a Data Cube. Proc. of the 1999, Intl. Conf. on Scientific and Statistical Database Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.Silverman, B. W. Density Estimation. Chapman and Hall. 1986.Google ScholarGoogle Scholar
  19. 19.Thisted R.A. Elements of Statistical Computing, Chapman and Hall. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.TPC Benchmark R. Decision Support. Revision 1.1.0. http://www.tpc.org.Google ScholarGoogle Scholar
  21. 21.Vitter J. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, 11(1):37-57, March 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.Vitter J., Wang M. Approximate Computation of Multidimensional Aggregates of Sparse Data using Wavelet. Proc. of ACM SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.Vitter J., Wang M., Iyer B. Data Cube Approximation and Histogram via Wavelets. Conf. on Information and Knowledge Management, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Zipf G.E. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc, 1949.Google ScholarGoogle Scholar

Index Terms

  1. A robust, optimization-based approach for approximate answering of aggregate queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
      May 2001
      630 pages
      ISBN:1581133324
      DOI:10.1145/375663

      Copyright © 2001 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 May 2001

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader