Skip to main content
Erschienen in: World Wide Web 4/2018

24.10.2017

Learning-based SPARQL query performance modeling and prediction

verfasst von: Wei Emma Zhang, Quan Z. Sheng, Yongrui Qin, Kerry Taylor, Lina Yao

Erschienen in: World Wide Web | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: Proceedings of the 14th international conference on extending database technology (EDBT 2011), pp. 449–460. Uppsala, Sweden (2011) Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: Proceedings of the 14th international conference on extending database technology (EDBT 2011), pp. 449–460. Uppsala, Sweden (2011)
2.
Zurück zum Zitat Akdere, M., Ċetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: Learning-based query performance modeling and prediction. In: Proceedings of the 28th international conference on data engineering (ICDE 2012), pp. 390–401. Washington DC, USA (2012) Akdere, M., Ċetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: Learning-based query performance modeling and prediction. In: Proceedings of the 28th international conference on data engineering (ICDE 2012), pp. 390–401. Washington DC, USA (2012)
3.
Zurück zum Zitat Altman, N.S.: An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am. Stat. 46(3), 175–185 (1992)MathSciNet Altman, N.S.: An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am. Stat. 46(3), 175–185 (1992)MathSciNet
4.
Zurück zum Zitat Bursztyn, D., Goasdouė, F., Manolescu, I.: Optimizing reformulation-based query answering in RDF. In: Proceedings of the 18th international conference on extending database technology (EDBT 2015), pp. 265–276. Brussels, Belgium (2015) Bursztyn, D., Goasdouė, F., Manolescu, I.: Optimizing reformulation-based query answering in RDF. In: Proceedings of the 18th international conference on extending database technology (EDBT 2015), pp. 265–276. Brussels, Belgium (2015)
5.
Zurück zum Zitat Chang, C., Lin, C.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)CrossRef Chang, C., Lin, C.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)CrossRef
6.
Zurück zum Zitat Duggan, J., Ċetintemel, U., Papaemmanouil, O., Upfal, E.: Performance prediction for concurrent database workloads. In: Proceedings of the 2011 international conference on management of data (SIGMOD 2011), pp. 337–348. Athens, Greece (2011) Duggan, J., Ċetintemel, U., Papaemmanouil, O., Upfal, E.: Performance prediction for concurrent database workloads. In: Proceedings of the 2011 international conference on management of data (SIGMOD 2011), pp. 337–348. Athens, Greece (2011)
7.
Zurück zum Zitat Ganapathi, A., Kuno, H.A., Dayal, U., Wiener, J.L., Fox, A., Jordan, M.I., Patterson, D.A.: Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In: Proceedings of the 25th international conference on data engineering (ICDE 2009), pp. 592–603. Shanghai China (2009) Ganapathi, A., Kuno, H.A., Dayal, U., Wiener, J.L., Fox, A., Jordan, M.I., Patterson, D.A.: Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In: Proceedings of the 25th international conference on data engineering (ICDE 2009), pp. 592–603. Shanghai China (2009)
8.
Zurück zum Zitat Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: Proceedings of the 17th international conference on extending database technology (EDBT 2014), pp. 439–450. Athens, Greece (2014) Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: Proceedings of the 17th international conference on extending database technology (EDBT 2014), pp. 439–450. Athens, Greece (2014)
9.
Zurück zum Zitat Hasan, R.: Predicting SPARQL query performance and explaining linked data. In: Proceedings of the 11th extended semantic web conference (ESWC 2014), pp. 795–805. Anissaras, Crete, Greece (2014) Hasan, R.: Predicting SPARQL query performance and explaining linked data. In: Proceedings of the 11th extended semantic web conference (ESWC 2014), pp. 795–805. Anissaras, Crete, Greece (2014)
10.
Zurück zum Zitat Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)CrossRef Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)CrossRef
11.
Zurück zum Zitat James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, Berlin (2013)CrossRef James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, Berlin (2013)CrossRef
12.
Zurück zum Zitat Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2002)MATH Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2002)MATH
13.
Zurück zum Zitat Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRef Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRef
14.
Zurück zum Zitat Li, J., Kȯnig, A.C., Narasayya, V.R., Chaudhuri, S.: Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. The VLDB Endowment (PVLDB) 5(11), 1555–1566 (2012)CrossRef Li, J., Kȯnig, A.C., Narasayya, V.R., Chaudhuri, S.: Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. The VLDB Endowment (PVLDB) 5(11), 1555–1566 (2012)CrossRef
15.
Zurück zum Zitat Morsey, M., Lehmann, J., Auer, S., Ngomo, A.N.: Usage-Centric Benchmarking of RDF Triple Stores. In: Proceedings of the 26th AAAI conference on artificial intelligence. Toronto, Canada (2012) Morsey, M., Lehmann, J., Auer, S., Ngomo, A.N.: Usage-Centric Benchmarking of RDF Triple Stores. In: Proceedings of the 26th AAAI conference on artificial intelligence. Toronto, Canada (2012)
16.
Zurück zum Zitat Neumann, T., Moerkotte, G.: Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the 27th international conference on data engineering (ICDE 2011), pp. 984–994. Hannover, Germany (2011) Neumann, T., Moerkotte, G.: Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the 27th international conference on data engineering (ICDE 2011), pp. 984–994. Hannover, Germany (2011)
17.
Zurück zum Zitat Pėrez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)CrossRef Pėrez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)CrossRef
18.
Zurück zum Zitat Quilitz, B., Leser, U.: Querying distributed rdf data sources with sparql. In: Proceedings of the 5th Extended Semantic Web Conference (ESWC 2008), pp. 524–538. Tenerife, Spain (2008) Quilitz, B., Leser, U.: Querying distributed rdf data sources with sparql. In: Proceedings of the 5th Extended Semantic Web Conference (ESWC 2008), pp. 524–538. Tenerife, Spain (2008)
19.
Zurück zum Zitat Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)CrossRef Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)CrossRef
20.
Zurück zum Zitat Smola, A., Vapnik, V.: Support Vector Regression Machines. Adv. Neural Inf. Proces. Syst. 9, 155–161 (1997) Smola, A., Vapnik, V.: Support Vector Regression Machines. Adv. Neural Inf. Proces. Syst. 9, 155–161 (1997)
21.
Zurück zum Zitat Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation. In: Proceedings of the 17th international world wide web conference (WWW 2008), pp. 595–604. Beijing, China (2008) Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation. In: Proceedings of the 17th international world wide web conference (WWW 2008), pp. 595–604. Beijing, China (2008)
22.
Zurück zum Zitat Tozer, S., Brecht, T., Aboulnaga, A.: Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In: Proceedings of the 26th international conference on data engineering (ICDE 2010), pp. 397–408. Long Beach, USA (2010) Tozer, S., Brecht, T., Aboulnaga, A.: Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In: Proceedings of the 26th international conference on data engineering (ICDE 2010), pp. 397–408. Long Beach, USA (2010)
23.
Zurück zum Zitat Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.A.: Heuristics-based query optimisation for SPARQL. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), pp. 324–335. Uppsala, Sweden (2012) Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.A.: Heuristics-based query optimisation for SPARQL. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), pp. 324–335. Uppsala, Sweden (2012)
24.
Zurück zum Zitat Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigu̇mu̇s, H., Naughton, J.F.: Predicting query execution time: Are optimizer cost models really unusable?. In: Proceedings of the 29th International Conference on Data Engineering (ICDE 2013), pp. 1081–1092. Brisbane Australia (2013) Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigu̇mu̇s, H., Naughton, J.F.: Predicting query execution time: Are optimizer cost models really unusable?. In: Proceedings of the 29th International Conference on Data Engineering (ICDE 2013), pp. 1081–1092. Brisbane Australia (2013)
25.
Zurück zum Zitat Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A.F.M., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRef Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A.F.M., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRef
26.
Zurück zum Zitat Zhang, W.E., Sheng, Q.Z.: Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases. In: Handproceedings of big data technologies, pp. 429–453 (2017) Zhang, W.E., Sheng, Q.Z.: Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases. In: Handproceedings of big data technologies, pp. 429–453 (2017)
27.
Zurück zum Zitat Zhang, W.E., Sheng, Q.Z., Taylor, K., Qin, Y.: Identifying and Caching Hot Triples for Efficient RDF Query Processing. In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA 2015), pp. 259–274. Hanoi, Vietnam (2015) Zhang, W.E., Sheng, Q.Z., Taylor, K., Qin, Y.: Identifying and Caching Hot Triples for Efficient RDF Query Processing. In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA 2015), pp. 259–274. Hanoi, Vietnam (2015)
Metadaten
Titel
Learning-based SPARQL query performance modeling and prediction
verfasst von
Wei Emma Zhang
Quan Z. Sheng
Yongrui Qin
Kerry Taylor
Lina Yao
Publikationsdatum
24.10.2017
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 4/2018
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-017-0498-1

Weitere Artikel der Ausgabe 4/2018

World Wide Web 4/2018 Zur Ausgabe