research-article

Robust estimation of resource consumption for SQL queries using statistical techniques

Authors:
Jiexing Li

University of Wisconsin - Madison, Madison, WI

University of Wisconsin - Madison, Madison, WI
View Profile

,
Arnd Christian König

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Vivek Narasayya

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Surajit Chaudhuri

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

Proceedings of the VLDB Endowment Volume 5 Issue 11pp 1555–1566https://doi.org/10.14778/2350229.2350269

Published:01 July 2012Publication History

Proceedings of the VLDB Endowment

Abstract

The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of statistical techniques for resource estimation in place of the manually constructed cost models used in query optimization. Such techniques, which require as training data examples of resource usage in queries, offer the promise of superior estimation accuracy since they can account for factors such as hardware characteristics of the system or bias in cardinality estimates. However, the proposed approaches lack robustness in that they do not generalize well to queries that are different from the training examples, resulting in significant estimation errors. Our approach aims to address this problem by combining knowledge of database query processing with statistical models. We model resource-usage at the level of individual operators, with different models and features for each operator type, and explicitly model the asymptotic behavior of each operator. This results in significantly better estimation accuracy and the ability to estimate resource usage of arbitrary plans, even when they are very different from the training instances. We validate our approach using various large scale real-life and benchmark workloads on Microsoft SQL Server.

References

Decision Tree Learning, Bregman Divergences, and the Histogram Trick. In submission.Google Scholar
Program for TPC-H data generation with Skew. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.Google Scholar
TPC-H and TPC-DS Benchmarks. http://www.tpc.org.Google Scholar
WEKA workbench. http://www.cs.waikato.ac.nz/ml/weka/.Google Scholar
A. Aboulnaga and S. Chaudhuri. Self-tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD, pages 181--192, 1999. Google ScholarDigital Library
M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Predicting Completion Times of Batch Query Workloads using Interaction-aware Models and Simulation. In EDBT, pages 449--460, 2011. Google ScholarDigital Library
M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. The Case for Predictive Database Systems: Opportunities and Challenges. In CIDR, pages 167--174, 2011.Google Scholar
M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. Learning-based Query Performance Modeling and Prediction. In ICDE, 2012. Google ScholarDigital Library
N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: A Multidimensional Workload-Aware Histogram. In SIGMOD, pages 211--222, 2001. Google ScholarDigital Library
C. Curino, E. P. C. Jones, S. Madden, and H. Balakrishnan. Workload-aware Database Monitoring and Consolidation. In SIGMOD, pages 313--324, 2011. Google ScholarDigital Library
D. J. DeWitt, J. F. Naughton, and J. Burger. Nested Loops Revisited. In PDIS, pages 230--242, 1993. Google ScholarDigital Library
J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance Prediction for Concurrent Database Workloads. In SIGMOD, pages 337--348, 2011. Google ScholarDigital Library
M. Elhemali, C. A. Galindo-Legaria, T. Grabs, and M. M. Joshi. Execution Strategies for SQL Subqueries. In SIGMOD, pages 993--1004, 2007. Google ScholarDigital Library
J. Friedman. Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics, 29(5):1189--1232, 2001.Google ScholarCross Ref
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE, pages 592--603, 2009. Google ScholarDigital Library
G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surv., 25(2):73--169, 1993. Google ScholarDigital Library
H. Herodotou and S. Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. In VLDB, pages 1111--1122, 2011.Google ScholarDigital Library
E. P. D. Pednault. Transform Regression and the Kolmogorov Superposition Theorem. In SDM, pages 35--46, 2006.Google ScholarCross Ref
M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, pages 19--28, 2001. Google ScholarDigital Library
V. Vapnik. Statistical Learning Theory. Wiley, 2000.Google Scholar
Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Ranking, Boosting, and Model Adaptation. Technical report, Microsoft Research, 2008.Google Scholar
N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical Learning Techniques for Costing XML Queries. In VLDB, pages 289--300, 2005. Google ScholarDigital Library

Recommendations

Online estimation for subset-based SQL queries
VLDB '05: Proceedings of the 31st international conference on Very large data bases

The largest databases in use today are so large that answering a query exactly can take minutes, hours, or even days. One way to address this problem is to make use of approximation algorithms. Previous work on online aggregation has considered how to ...
Read More
Robust variance estimation for random effects meta-analysis

In random effects meta-analysis, an overall effect is estimated using a weighted mean, with weights based on estimated marginal variances. The variance of the overall effect is often estimated using the inverse of the sum of the estimated weights, and ...
Read More
Equivalence of SQL queries in presence of embedded dependencies
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 5, Issue 11
July 2012
608 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2012
Published in pvldb Volume 5, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 52
  Total Citations
  View Citations
- 313
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust estimation of resource consumption for SQL queries using statistical techniques

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Online estimation for subset-based SQL queries

Robust variance estimation for random effects meta-analysis

Equivalence of SQL queries in presence of embedded dependencies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Robust estimation of resource consumption for SQL queries using statistical techniques

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Online estimation for subset-based SQL queries

Robust variance estimation for random effects meta-analysis

Equivalence of SQL queries in presence of embedded dependencies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media