skip to main content
research-article

Robust estimation of resource consumption for SQL queries using statistical techniques

Published:01 July 2012Publication History
Skip Abstract Section

Abstract

The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of statistical techniques for resource estimation in place of the manually constructed cost models used in query optimization. Such techniques, which require as training data examples of resource usage in queries, offer the promise of superior estimation accuracy since they can account for factors such as hardware characteristics of the system or bias in cardinality estimates. However, the proposed approaches lack robustness in that they do not generalize well to queries that are different from the training examples, resulting in significant estimation errors. Our approach aims to address this problem by combining knowledge of database query processing with statistical models. We model resource-usage at the level of individual operators, with different models and features for each operator type, and explicitly model the asymptotic behavior of each operator. This results in significantly better estimation accuracy and the ability to estimate resource usage of arbitrary plans, even when they are very different from the training instances. We validate our approach using various large scale real-life and benchmark workloads on Microsoft SQL Server.

References

  1. Decision Tree Learning, Bregman Divergences, and the Histogram Trick. In submission.Google ScholarGoogle Scholar
  2. Program for TPC-H data generation with Skew. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.Google ScholarGoogle Scholar
  3. TPC-H and TPC-DS Benchmarks. http://www.tpc.org.Google ScholarGoogle Scholar
  4. WEKA workbench. http://www.cs.waikato.ac.nz/ml/weka/.Google ScholarGoogle Scholar
  5. A. Aboulnaga and S. Chaudhuri. Self-tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD, pages 181--192, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Predicting Completion Times of Batch Query Workloads using Interaction-aware Models and Simulation. In EDBT, pages 449--460, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. The Case for Predictive Database Systems: Opportunities and Challenges. In CIDR, pages 167--174, 2011.Google ScholarGoogle Scholar
  8. M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. Learning-based Query Performance Modeling and Prediction. In ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: A Multidimensional Workload-Aware Histogram. In SIGMOD, pages 211--222, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Curino, E. P. C. Jones, S. Madden, and H. Balakrishnan. Workload-aware Database Monitoring and Consolidation. In SIGMOD, pages 313--324, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. J. DeWitt, J. F. Naughton, and J. Burger. Nested Loops Revisited. In PDIS, pages 230--242, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance Prediction for Concurrent Database Workloads. In SIGMOD, pages 337--348, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Elhemali, C. A. Galindo-Legaria, T. Grabs, and M. M. Joshi. Execution Strategies for SQL Subqueries. In SIGMOD, pages 993--1004, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Friedman. Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics, 29(5):1189--1232, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE, pages 592--603, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surv., 25(2):73--169, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Herodotou and S. Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. In VLDB, pages 1111--1122, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. P. D. Pednault. Transform Regression and the Kolmogorov Superposition Theorem. In SDM, pages 35--46, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, pages 19--28, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Vapnik. Statistical Learning Theory. Wiley, 2000.Google ScholarGoogle Scholar
  21. Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Ranking, Boosting, and Model Adaptation. Technical report, Microsoft Research, 2008.Google ScholarGoogle Scholar
  22. N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical Learning Techniques for Costing XML Queries. In VLDB, pages 289--300, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 5, Issue 11
    July 2012
    608 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2012
    Published in pvldb Volume 5, Issue 11

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader