Abstract
Predicting query execution time is crucial for many database management tasks including admission control, query scheduling, and progress monitoring. While a number of recent papers have explored this problem, the bulk of the existing work either considers prediction for a single query, or prediction for a static workload of concurrent queries, where by "static" we mean that the queries to be run are fixed and known. In this paper, we consider the more general problem of dynamic concurrent workloads. Unlike most previous work on query execution time prediction, our proposed framework is based on analytic modeling rather than machine learning. We first use the optimizer's cost model to estimate the I/O and CPU requirements for each pipeline of each query in isolation, and then use a combination queueing model and buffer pool model that merges the I/O and CPU requests from concurrent queries to predict running times. We compare the proposed approach with a machine-learning based approach that is a variant of previous work. Our experiments show that our analytic-model based approach can lead to competitive and often better prediction accuracy than its machine-learning based counterpart.
- M. Ahmad, A. Aboulnaga, S. Babu, and K. Munagala. Interaction-aware scheduling of report-generation workloads. The VLDB Journal, 20:589-615, 2011. Google Scholar
- M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Predicting completion times of batch query workloads using interaction-aware models and simulation. In EDBT, pages 449-460, 2011. Google Scholar
- M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In ICDE, pages 390-401, 2012. Google Scholar
- F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1-48, 2002. Google Scholar
- S. Chaudhuri, V. R. Narasayya, and R. Ramamurthy. Estimating progress of execution for SQL queries. In SIGMOD, 2004. Google Scholar
- J. Duggan, U. Çetintemel, O. Papaemmanouil, and E. Upfal. Performance prediction for concurrent database workloads. In SIGMOD, pages 337-348, 2011. Google Scholar
- J. H. Friedman. Stochastic gradient boosting. Comput. Stat. Data Anal., 38(4):367-378, 2002. Google Scholar
- A. Ganapathi, H. A. Kuno, U. Dayal, J. L.Wiener, A. Fox, M. I. Jordan, and D. A. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, 2009. Google Scholar
- G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73-170, 1993. Google Scholar
- S. Guirguis, M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Adaptive scheduling of web transactions. In ICDE, 2009. Google Scholar
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10-18, 2009. Google Scholar
- E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik. Quantitative system performance - computer system analysis using queueing network models. Prentice Hall, 1984. Google Scholar
- J. Li, A. C. König, V. R. Narasayya, and S. Chaudhuri. Robust estimation of resource consumption for sql queries using statistical techniques. PVLDB, 5(11):1555-1566, 2012. Google Scholar
- G. Luo, J. F. Naughton, C. J. Ellmann, and M. Watzke. Toward a progress indicator for database queries. In SIGMOD, 2004. Google Scholar
- C. Mishra and N. Koudas. The design of a query monitoring system. ACM Trans. Database Syst., 34(1):1-51, 2009. Google Scholar
- V. F. Nicola, A. Dan, and D. M. Dias. Analysis of the generalized clock buffer replacement scheme for database transaction processing. In SIGMETRICS, pages 35-46, 1992. Google Scholar
- R. Osman, I. Awan, and M. E. Woodward. Application of queueing network models in the performance evaluation of database designs. Electr. Notes Theor. Comput. Sci., 232:101-124, 2009. Google Scholar
- J. R. Quinlan. Simplifying decision trees, 1986.Google Scholar
- R. Ramamurthy and D. J. DeWitt. Buffer-pool aware query optimization. In CIDR, pages 250-261, 2005.Google Scholar
- M. Reiser and S. S. Lavenberg. Mean-value analysis of closed multichain queuing networks. J. ACM, 27(2):313-322, 1980. Google Scholar
- Scilab Enterprises. Scilab: Free and Open Source software for numerical computation. Scilab Enterprises, Orsay, France, 2012.Google Scholar
- K. C. Sevcik. Data base system performance prediction using an analytical model (invited paper). In VLDB, pages 182-198, 1981. Google Scholar
- R. Suri, S. Sahu, and M. Vernon. Approximate mean value analysis for closed queueing networks with multiple-server stations. In Proc. of Industrial Engineering Research Conf. (IERC), 2007.Google Scholar
- N. Tomov, E. W. Dempster, M. H. Williams, A. Burger, H. Taylor, P. J. B. King, and P. Broughton. Analytical response time estimation in parallel relational database systems. Parallel Computing, 30(2):249-283, 2004. Google Scholar
- S. Tozer, T. Brecht, and A. Aboulnaga. Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In ICDE, 2010.Google Scholar
- T. J. Wasserman, P. Martin, D. B. Skillicorn, and H. Rizvi. Developing a characterization of business intelligence workloads for sizing new database systems. In DOLAP, pages 7-13, 2004. Google Scholar
- W. Wu, Y. Chi, H. Hacigümüs, and J. F. Naughton. Towards predicting query execution time for concurrent and dynamic database workloads. Technical Report TR-NECLA-DM-2013-13, NEC Laboratories America, 2013.Google Scholar
- W. Wu, Y. Chi, S. Zhu, J. Tatemura, H. Hacigümüs, and J. F. Naughton. Predicting query execution time: are optimizer cost models really unusable? In ICDE, pages 1081-1092, 2013. Google Scholar
Index Terms
- Towards predicting query execution time for concurrent and dynamic database workloads
Recommendations
Predicting SQL Query Execution Time for Large Data Volume
IDEAS '16: Proceedings of the 20th International Database Engineering & Applications SymposiumIn a production system, increase in data size will increase the execution time of the application's SQL queries and degrade its performance. Tuning SQL queries in production requires additional efforts and cost. Time constraints during application ...
Performance prediction for concurrent database workloads
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataCurrent trends in data management systems, such as cloud and multi-tenant databases, are leading to data processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need to satisfy diverse performance ...
A QueryRating-Based Statistical Model for Predicting Concurrent Query Response Time
Web Information Systems and ApplicationsAbstractPredicting query response time plays an important role in managing database systems. It can be used for tasks such as query scheduling, resource allocation, and system capacity planning. Due to the uncertainty of database systems, especially query ...
Comments