Article

When can we trust progress estimators for SQL queries?

Authors:
Surajit Chaudhuri

Microsoft Research

Microsoft Research
View Profile

,
Raghav Kaushik

Microsoft Research

Microsoft Research
View Profile

,
Ravishankar Ramamurthy

Microsoft Research

Microsoft Research
View Profile

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of dataJune 2005Pages 575–586https://doi.org/10.1145/1066157.1066223

Published:14 June 2005Publication History

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

Pages 575–586

ABSTRACT

The problem of estimating progress for long-running queries has recently been introduced. We analyze the characteristics of the progress estimation problem, from the perspective of providing robust, worst-case guarantees. Our first result is that in the worst case, no progress estimation algorithm can yield anything even moderately better than the trivial guarantee that identifies the progress as lying between 0% and 100%. In such cases, we introduce an estimator that can optimally bound the error. However, we show that in many "good" scenarios, it is possible to design effective progress estimators with small error bounds. We then demonstrate empirically that these "good" scenarios are common in practice and discuss possible ways of combining the estimators.

References

S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, 1999. Google ScholarDigital Library
N. Bruno and S. Chaudhuri. Statistics on query expressions. In SIGMOD, 2002.Google Scholar
S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. In SIGMOD, 1999. Google ScholarDigital Library
S. Chaudhuri and V. Narasayya. The sky server database. http://skyserver.sdss.org.Google Scholar
S. Chaudhuri, V. Narasayya, and R. Ramamurthy. Estimating progress of long-running queries. In SIGMOD, 2004. Google ScholarDigital Library
P. Haas. Large-sample and deterministic confidence intervals for online aggregation. In SSDBM, 1997. Google ScholarDigital Library
P. Haas and J. Hellerstein. Ripple joins for online aggregation. In SIGMOD, 1999. Google ScholarDigital Library
P. Haas, J. Naughton, S. Seshadri, and A. Swami. Selectivity and cost estimation for joins based on random sampling. Journal of Computer and System Sciences, 1996. Google ScholarDigital Library
J. Hellerstein, P. Haas, and H. Wang. Online aggregation. In SIGMOD, 1997. Google ScholarDigital Library
Y. Ioannadis. Query optimization. ACM Computing Surveys, 1996. Google ScholarDigital Library
Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991. Google ScholarDigital Library
N. Kabra and D. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD, 1998. Google ScholarDigital Library
G. Luo, J. Naughton, C. Ellmann, and M. Watzke. Towards a progress indicator for database queries. In SIGMOD, 2004. Google ScholarDigital Library
G. Luo, J. Naughton, C. Ellmann, and M. Watzke. Increasing the accuracy and coverage of SQL progress indicators. In ICDE, 2005. Google ScholarDigital Library
V. Markl, V. Raman, D. E. Simmen, G. M. Lohman, and H. Pirahesh. Robust query processing through progressive optimization. In SIGMOD, 2004. Google ScholarDigital Library
V. Poosala and Y. E. Ioannidis. Balancing histogram optimality and practicality for query result size estimation. In SIGMOD, 1995. Google ScholarDigital Library
The TPCH Benchmark. http://www.tpc.org.Google Scholar
Program for TPC-D Data Generation with Skew. ftp://ftp.research.microsoft.com/users/viveknar/tpcdskew.Google Scholar

When can we trust progress estimators for SQL queries?
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Sampling-based estimators for subset-based queries

We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator ...
Read More
Multi-query SQL progress indicators
EDBT'06: Proceedings of the 10th international conference on Advances in Database Technology

Recently, progress indicators have been proposed for SQL queries in RDBMSs. All previously proposed progress indicators consider each query in isolation, ignoring the impact simultaneously running queries have on each other’s performance. In this paper, ...
Read More
Performance analysis of multivariate complex amplitude estimators
Part II

We consider multivariate complex amplitude estimation in the presence of unknown interference and noise. Two multivariate approaches [Maximum Likelihood (ML) and Capon] are provided. We derive the closed-form expression of the Crame´r-Rao bound (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
Conference Chair:
Fatma Ozcan
IBM Almaden Research Center
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 646
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

When can we trust progress estimators for SQL queries?

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Sampling-based estimators for subset-based queries

Multi-query SQL progress indicators

Performance analysis of multivariate complex amplitude estimators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

When can we trust progress estimators for SQL queries?

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Sampling-based estimators for subset-based queries

Multi-query SQL progress indicators

Performance analysis of multivariate complex amplitude estimators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media