ABSTRACT
The problem of estimating progress for long-running queries has recently been introduced. We analyze the characteristics of the progress estimation problem, from the perspective of providing robust, worst-case guarantees. Our first result is that in the worst case, no progress estimation algorithm can yield anything even moderately better than the trivial guarantee that identifies the progress as lying between 0% and 100%. In such cases, we introduce an estimator that can optimally bound the error. However, we show that in many "good" scenarios, it is possible to design effective progress estimators with small error bounds. We then demonstrate empirically that these "good" scenarios are common in practice and discuss possible ways of combining the estimators.
- S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, 1999. Google ScholarDigital Library
- N. Bruno and S. Chaudhuri. Statistics on query expressions. In SIGMOD, 2002.Google Scholar
- S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. In SIGMOD, 1999. Google ScholarDigital Library
- S. Chaudhuri and V. Narasayya. The sky server database. http://skyserver.sdss.org.Google Scholar
- S. Chaudhuri, V. Narasayya, and R. Ramamurthy. Estimating progress of long-running queries. In SIGMOD, 2004. Google ScholarDigital Library
- P. Haas. Large-sample and deterministic confidence intervals for online aggregation. In SSDBM, 1997. Google ScholarDigital Library
- P. Haas and J. Hellerstein. Ripple joins for online aggregation. In SIGMOD, 1999. Google ScholarDigital Library
- P. Haas, J. Naughton, S. Seshadri, and A. Swami. Selectivity and cost estimation for joins based on random sampling. Journal of Computer and System Sciences, 1996. Google ScholarDigital Library
- J. Hellerstein, P. Haas, and H. Wang. Online aggregation. In SIGMOD, 1997. Google ScholarDigital Library
- Y. Ioannadis. Query optimization. ACM Computing Surveys, 1996. Google ScholarDigital Library
- Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991. Google ScholarDigital Library
- N. Kabra and D. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD, 1998. Google ScholarDigital Library
- G. Luo, J. Naughton, C. Ellmann, and M. Watzke. Towards a progress indicator for database queries. In SIGMOD, 2004. Google ScholarDigital Library
- G. Luo, J. Naughton, C. Ellmann, and M. Watzke. Increasing the accuracy and coverage of SQL progress indicators. In ICDE, 2005. Google ScholarDigital Library
- V. Markl, V. Raman, D. E. Simmen, G. M. Lohman, and H. Pirahesh. Robust query processing through progressive optimization. In SIGMOD, 2004. Google ScholarDigital Library
- V. Poosala and Y. E. Ioannidis. Balancing histogram optimality and practicality for query result size estimation. In SIGMOD, 1995. Google ScholarDigital Library
- The TPCH Benchmark. http://www.tpc.org.Google Scholar
- Program for TPC-D Data Generation with Skew. ftp://ftp.research.microsoft.com/users/viveknar/tpcdskew.Google Scholar
- When can we trust progress estimators for SQL queries?
Recommendations
Sampling-based estimators for subset-based queries
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator ...
Multi-query SQL progress indicators
EDBT'06: Proceedings of the 10th international conference on Advances in Database TechnologyRecently, progress indicators have been proposed for SQL queries in RDBMSs. All previously proposed progress indicators consider each query in isolation, ignoring the impact simultaneously running queries have on each other’s performance. In this paper, ...
Performance analysis of multivariate complex amplitude estimators
Part IIWe consider multivariate complex amplitude estimation in the presence of unknown interference and noise. Two multivariate approaches [Maximum Likelihood (ML) and Capon] are provided. We derive the closed-form expression of the Crame´r-Rao bound (...
Comments