ABSTRACT
The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem whose goal is to minimize the error in answering queries in the given workload. A key novelty of our approach is that we can tailor the choice of samples to be robust even for workloads that are “similar” but not necessarily identical to the given workload. Finally, our techniques recognize the importance of taking into account the variance in the data distribution in a principled manner. We show how our solution can be implemented on a database system, and present results of extensive experiments on Microsoft SQL Server 2000 that demonstrate the superior quality of our method compared to previous work.
- 1.Acharya S., Gibbons P.B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries. Proc. of ACM SIGMOD, 2000. Google ScholarDigital Library
- 2.Acharya S., Gibbons P.B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering. Proc. of ACM SIGMOD, 1999. Google ScholarDigital Library
- 3.Barbara D., and Sullivan M. Quasi-Cubes. Exploiting Approximations in Multidimensional Databases. SIGMOD Record, Vol. 26 No. 3, Sep. 1997. Google ScholarDigital Library
- 4.Barbara D. and Wu. X. Using Approximations to Scale Exploratory Data Analysis in Datacubes. Proceedings of the 1999 ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, San Diego, CA, Aug. 1999. Google ScholarDigital Library
- 5.Chakrabarti K., Garofalakis M., Rastogi R., Shim K. Approximate Query Processing Using Wavelets. Proc. of VLDB 2000. Google ScholarDigital Library
- 6.Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries. Proc. of IEEE Conf. on Data Engineering, 2001. Google ScholarDigital Library
- 7.Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries. Microsoft Research Technical Report MSR-TR-2001-37.Google Scholar
- 8.Chaudhuri S., Motwani R., Narasayya V. Random Sampling Over Joins. Proc. of ACM SIGMOD, 1999. Google ScholarDigital Library
- 9.Chaudhuri S., Narasayya V. Program for TPC-D Data Generation with Skew. http://research.microsoft.com/dmx/Google Scholar
- 10.Cochran W.G. Sampling Techniques. John Wiley & Sons, New York, Third edition, 1977.Google Scholar
- 11.Ganti V., Lee M.L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering. Proc. of VLDB, 2000. Google ScholarDigital Library
- 12.Golub G., Loan C. Matrix Computations. Johns Hopkins University Press, 1989.Google Scholar
- 13.Hellerstein J., Haas P., Wang H. Online Aggregation. Proc. of ACM SIGMOD,1997. Google ScholarDigital Library
- 14.HochBaum B. Approximation Algorithms for NP-Hard Problems. PWS Publishing, 1997. Google ScholarDigital Library
- 15.Ioannidis Y., Poosala V. Histogram Based Approximations of Set- Valued Query Answers. Proc. of VLDB 1999. Google ScholarDigital Library
- 16.Marron J.S, Smoothing Methods for Learning from Data. Proc. of KDD 1998, Tutorial.Google Scholar
- 17.Poosala V., Ganti V. Fast Approximate Answers to Aggregate Queries on a Data Cube. Proc. of the 1999, Intl. Conf. on Scientific and Statistical Database Management. Google ScholarDigital Library
- 18.Silverman, B. W. Density Estimation. Chapman and Hall. 1986.Google Scholar
- 19.Thisted R.A. Elements of Statistical Computing, Chapman and Hall. 1988. Google ScholarDigital Library
- 20.TPC Benchmark R. Decision Support. Revision 1.1.0. http://www.tpc.org.Google Scholar
- 21.Vitter J. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, 11(1):37-57, March 1985. Google ScholarDigital Library
- 22.Vitter J., Wang M. Approximate Computation of Multidimensional Aggregates of Sparse Data using Wavelet. Proc. of ACM SIGMOD, 1999. Google ScholarDigital Library
- 23.Vitter J., Wang M., Iyer B. Data Cube Approximation and Histogram via Wavelets. Conf. on Information and Knowledge Management, 1998. Google ScholarDigital Library
- 24.Zipf G.E. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc, 1949.Google Scholar
Index Terms
- A robust, optimization-based approach for approximate answering of aggregate queries
Recommendations
A robust, optimization-based approach for approximate answering of aggregate queries
The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem whose goal ...
Congressional samples for approximate answering of group-by queries
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into ...
Congressional samples for approximate answering of group-by queries
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of dataIn large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into ...
Comments