Article

A robust, optimization-based approach for approximate answering of aggregate queries

Authors:
Surajit Chaudhuri

Microsoft Research, One Microsoft Way, Redmond, WA

Microsoft Research, One Microsoft Way, Redmond, WA
View Profile

,
Gautam Das

Microsoft Research, One Microsoft Way, Redmond, WA

Microsoft Research, One Microsoft Way, Redmond, WA
View Profile

,
Vivek Narasayya

Microsoft Research, One Microsoft Way, Redmond, WA

Microsoft Research, One Microsoft Way, Redmond, WA
View Profile

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of dataMay 2001Pages 295–306https://doi.org/10.1145/375663.375694

Published:01 May 2001Publication History

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data

Pages 295–306

ABSTRACT

The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem whose goal is to minimize the error in answering queries in the given workload. A key novelty of our approach is that we can tailor the choice of samples to be robust even for workloads that are “similar” but not necessarily identical to the given workload. Finally, our techniques recognize the importance of taking into account the variance in the data distribution in a principled manner. We show how our solution can be implemented on a database system, and present results of extensive experiments on Microsoft SQL Server 2000 that demonstrate the superior quality of our method compared to previous work.

References

1.Acharya S., Gibbons P.B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries. Proc. of ACM SIGMOD, 2000. Google ScholarDigital Library
2.Acharya S., Gibbons P.B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering. Proc. of ACM SIGMOD, 1999. Google ScholarDigital Library
3.Barbara D., and Sullivan M. Quasi-Cubes. Exploiting Approximations in Multidimensional Databases. SIGMOD Record, Vol. 26 No. 3, Sep. 1997. Google ScholarDigital Library
4.Barbara D. and Wu. X. Using Approximations to Scale Exploratory Data Analysis in Datacubes. Proceedings of the 1999 ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, San Diego, CA, Aug. 1999. Google ScholarDigital Library
5.Chakrabarti K., Garofalakis M., Rastogi R., Shim K. Approximate Query Processing Using Wavelets. Proc. of VLDB 2000. Google ScholarDigital Library
6.Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries. Proc. of IEEE Conf. on Data Engineering, 2001. Google ScholarDigital Library
7.Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries. Microsoft Research Technical Report MSR-TR-2001-37.Google Scholar
8.Chaudhuri S., Motwani R., Narasayya V. Random Sampling Over Joins. Proc. of ACM SIGMOD, 1999. Google ScholarDigital Library
9.Chaudhuri S., Narasayya V. Program for TPC-D Data Generation with Skew. http://research.microsoft.com/dmx/Google Scholar
10.Cochran W.G. Sampling Techniques. John Wiley & Sons, New York, Third edition, 1977.Google Scholar
11.Ganti V., Lee M.L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering. Proc. of VLDB, 2000. Google ScholarDigital Library
12.Golub G., Loan C. Matrix Computations. Johns Hopkins University Press, 1989.Google Scholar
13.Hellerstein J., Haas P., Wang H. Online Aggregation. Proc. of ACM SIGMOD,1997. Google ScholarDigital Library
14.HochBaum B. Approximation Algorithms for NP-Hard Problems. PWS Publishing, 1997. Google ScholarDigital Library
15.Ioannidis Y., Poosala V. Histogram Based Approximations of Set- Valued Query Answers. Proc. of VLDB 1999. Google ScholarDigital Library
16.Marron J.S, Smoothing Methods for Learning from Data. Proc. of KDD 1998, Tutorial.Google Scholar
17.Poosala V., Ganti V. Fast Approximate Answers to Aggregate Queries on a Data Cube. Proc. of the 1999, Intl. Conf. on Scientific and Statistical Database Management. Google ScholarDigital Library
18.Silverman, B. W. Density Estimation. Chapman and Hall. 1986.Google Scholar
19.Thisted R.A. Elements of Statistical Computing, Chapman and Hall. 1988. Google ScholarDigital Library
20.TPC Benchmark R. Decision Support. Revision 1.1.0. http://www.tpc.org.Google Scholar
21.Vitter J. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, 11(1):37-57, March 1985. Google ScholarDigital Library
22.Vitter J., Wang M. Approximate Computation of Multidimensional Aggregates of Sparse Data using Wavelet. Proc. of ACM SIGMOD, 1999. Google ScholarDigital Library
23.Vitter J., Wang M., Iyer B. Data Cube Approximation and Histogram via Wavelets. Conf. on Information and Knowledge Management, 1998. Google ScholarDigital Library
24.Zipf G.E. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc, 1949.Google Scholar

Index Terms

A robust, optimization-based approach for approximate answering of aggregate queries
1. Computer systems organization
  1. Architectures

Recommendations

A robust, optimization-based approach for approximate answering of aggregate queries

The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem whose goal ...
Read More
Congressional samples for approximate answering of group-by queries

In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into ...
Read More
Congressional samples for approximate answering of group-by queries
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Editors:
Timos Sellis,
Sharad Mehrotra
ACM SIGMOD Record Volume 30, Issue 2
June 2001
625 pages
ISSN:0163-5808
DOI:10.1145/376284
Editors:
Timos Sellis
National Technical Univ. of Athens
,
Sharad Mehrotra
Univ. of California at Irvine
Issue’s Table of Contents
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 86
  Total Citations
  View Citations
- 1,179
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A robust, optimization-based approach for approximate answering of aggregate queries

Congressional samples for approximate answering of group-by queries

Congressional samples for approximate answering of group-by queries