Skip to main content
Erschienen in: The VLDB Journal 5/2018

18.09.2017 | Special Issue Paper

Query optimization through the looking glass, and what we found running the Join Order Benchmark

verfasst von: Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, Thomas Neumann

Erschienen in: The VLDB Journal | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark that works on real-life data riddled with correlations and introduces 113 complex join queries. We experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. For this purpose, we describe cardinality-estimate injection and extraction techniques that allow us to compare the cardinality estimators of multiple industrial SQL implementations on equal footing, and to characterize the value of having perfect cardinality estimates. Our investigation shows that all industrial-strength cardinality estimators routinely produce large errors: though cardinality estimation using table samples solves the problem for single-table queries, there are still no techniques in industrial systems that can deal accurately with join-crossing correlated query predicates. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. We investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the suboptimal cardinality estimates. Finally, we extend our investigation from main-memory only, to also include disk-based query processing. Here, we find that though accurate cardinality estimation should be the first priority, other aspects such as modeling random versus sequential I/O are also important to predict query runtime.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
4
Since in this paper we do not model or investigate aggregation, we omitted GROUP BY from our queries. To avoid communication from becoming the performance bottleneck for queries with large result sizes, we wrap all attributes in the projection clause with MIN(...) expressions when executing (but not when estimating). This change has no effect on PostgreSQL’s join order selection because its optimizer does not push down aggregations.
 
5
For our workload, it was still feasible to do this naïvely. For larger data sets, the approach by Chaudhuri et al. [8] may become necessary.
 
6
The sample is stored in the DataBlock [26] format, which enables fast scans (and therefore fast estimation). The sample is (re-)generated when computing the statistics of a table.
 
7
The reasons for this surprising behavior are two implementation artifacts: First, estimates that are less than 1 are rounded up to 1, making subexpression estimates sensitive to the (usually arbitrary) join enumeration order, which is affected by the from clause. The second is a consistency problem caused by incorrect domain sizes of predicate attributes in joins with multiple predicates.
 
8
In fact, rather than relational table partitioning, there was a separate XML document per venue, e.g., separate documents for SIGMOD, VLDB, Nature and a few thousand more venues. Storage in a separate XML document has roughly the same effect on access paths as partitioned tables.
 
9
We did not run extensive experiments to find the “best” parameter and do not claim that 50 is the best setting. Our goal is to measure the effect of adjusting cost parameters into a significantly more accurate direction to determine how important it is to tune the cost model.
 
10
Since the two query variants only differ in a constant within a selection predicate, they could be executed using the same prepared statement. Not statically knowing all constants statically presents additional, important, and well-researched challenges. However, we do not consider prepared statements in this work and always send the full query text.
 
11
We also ran experiments with a 128 MB buffer pool where we observed results that lie between the in-memory and the small buffer pool configuration.
 
12
echo 3> /proc/sys/vm/drop_caches.
 
Literatur
1.
Zurück zum Zitat Ahmed, R., Sen, R., Poess, M., Chakkappen, S.: Of snowstorms and bushy trees. PVLDB 7(13), 1452–1461 (2014) Ahmed, R., Sen, R., Poess, M., Chakkappen, S.: Of snowstorms and bushy trees. PVLDB 7(13), 1452–1461 (2014)
2.
Zurück zum Zitat Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD, pp. 119–130 (2005) Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD, pp. 119–130 (2005)
3.
Zurück zum Zitat Bellamkonda, S., Li, H.G., Jagtap, U., Zhu, Y., Liang, V., Cruanes, T.: Adaptive and big data scale parallel execution in Oracle. PVLDB 6(11), 1102–1113 (2013) Bellamkonda, S., Li, H.G., Jagtap, U., Zhu, Y., Liang, V., Cruanes, T.: Adaptive and big data scale parallel execution in Oracle. PVLDB 6(11), 1102–1113 (2013)
4.
Zurück zum Zitat Boncz, P.A., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: TPCTC, pp. 61–76 (2013) Boncz, P.A., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: TPCTC, pp. 61–76 (2013)
5.
Zurück zum Zitat Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: statistics-oblivious access paths. In: ICDE, pp. 315–326 (2015) Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: statistics-oblivious access paths. In: ICDE, pp. 315–326 (2015)
6.
Zurück zum Zitat Bruno, N., Galindo-Legaria, C.A., Joshi, M.: Polynomial heuristics for query optimization. In: ICDE, pp. 589–600 (2010) Bruno, N., Galindo-Legaria, C.A., Joshi, M.: Polynomial heuristics for query optimization. In: ICDE, pp. 589–600 (2010)
7.
Zurück zum Zitat Chaudhuri, S.: Query optimizers: time to rethink the contract? In: SIGMOD, pp. 961–968 (2009) Chaudhuri, S.: Query optimizers: time to rethink the contract? In: SIGMOD, pp. 961–968 (2009)
8.
Zurück zum Zitat Chaudhuri, S., Narasayya, V.R., Ramamurthy, R.: Exact cardinality query optimization for optimizer testing. PVLDB 2(1), 994–1005 (2009) Chaudhuri, S., Narasayya, V.R., Ramamurthy, R.: Exact cardinality query optimization for optimizer testing. PVLDB 2(1), 994–1005 (2009)
10.
Zurück zum Zitat Dutt, A., Haritsa, J.R.: Plan bouquets: query processing without selectivity estimation. In: SIGMOD, pp. 1039–1050 (2014) Dutt, A., Haritsa, J.R.: Plan bouquets: query processing without selectivity estimation. In: SIGMOD, pp. 1039–1050 (2014)
11.
Zurück zum Zitat Estan, C., Naughton, J.F.: End-biased samples for join cardinality estimation. In: ICDE, p. 20 (2006) Estan, C., Naughton, J.F.: End-biased samples for join cardinality estimation. In: ICDE, p. 20 (2006)
12.
Zurück zum Zitat Fegaras, L.: A new heuristic for optimizing large queries. In: DEXA, pp. 726–735 (1998) Fegaras, L.: A new heuristic for optimizing large queries. In: DEXA, pp. 726–735 (1998)
13.
Zurück zum Zitat Fender, P., Moerkotte, G.: Counter strike: generic top-down join enumeration for hypergraphs. PVLDB 6(14), 1822–1833 (2013) Fender, P., Moerkotte, G.: Counter strike: generic top-down join enumeration for hypergraphs. PVLDB 6(14), 1822–1833 (2013)
14.
Zurück zum Zitat Fender, P., Moerkotte, G., Neumann, T., Leis, V.: Effective and robust pruning for top-down join enumeration algorithms. In: ICDE, pp. 414–425 (2012) Fender, P., Moerkotte, G., Neumann, T., Leis, V.: Effective and robust pruning for top-down join enumeration algorithms. In: ICDE, pp. 414–425 (2012)
15.
Zurück zum Zitat Fraser, C., Giakoumakis, L., Hamine, V., Moore-Smith, K.F.: Testing cardinality estimation models in SQL Server. In: DBtest (2012) Fraser, C., Giakoumakis, L., Hamine, V., Moore-Smith, K.F.: Testing cardinality estimation models in SQL Server. In: DBtest (2012)
16.
Zurück zum Zitat Graefe, G.: A generalized join algorithm. In: BTW, pp. 267–286 (2011) Graefe, G.: A generalized join algorithm. In: BTW, pp. 267–286 (2011)
17.
Zurück zum Zitat Gu, Z., Soliman, M.A., Waas, F.M.: Testing the accuracy of query optimizers. In: DBTest (2012) Gu, Z., Soliman, M.A., Waas, F.M.: Testing the accuracy of query optimizers. In: DBTest (2012)
18.
Zurück zum Zitat Haas, P.J., Naughton, J.F., Seshadri, S., Swami, A.N.: Selectivity and cost estimation for joins based on random sampling. J. Comput. Syst. Sci. 52(3), 550–569 (1996)MathSciNetCrossRef Haas, P.J., Naughton, J.F., Seshadri, S., Swami, A.N.: Selectivity and cost estimation for joins based on random sampling. J. Comput. Syst. Sci. 52(3), 550–569 (1996)MathSciNetCrossRef
19.
Zurück zum Zitat Haritsa, J.R.: The Picasso database query optimizer visualizer. PVLDB 3(2), 1517–1520 (2010) Haritsa, J.R.: The Picasso database query optimizer visualizer. PVLDB 3(2), 1517–1520 (2010)
20.
Zurück zum Zitat Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD, pp. 647–658 (2004) Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD, pp. 647–658 (2004)
21.
Zurück zum Zitat Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)CrossRef Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)CrossRef
22.
Zurück zum Zitat Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results. In: SIGMOD (1991) Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results. In: SIGMOD (1991)
23.
Zurück zum Zitat Kader, R.A., Boncz, P.A., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD, pp. 615–626 (2009) Kader, R.A., Boncz, P.A., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD, pp. 615–626 (2009)
24.
Zurück zum Zitat Kaushik, R., Ré, C., Suciu, D.: General database statistics using entropy maximization. In: DBPL, pp. 84–99 (2009) Kaushik, R., Ré, C., Suciu, D.: General database statistics using entropy maximization. In: DBPL, pp. 84–99 (2009)
25.
Zurück zum Zitat Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: Should I scan or should I probe? In: SIGMOD (2017) Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: Should I scan or should I probe? In: SIGMOD (2017)
26.
Zurück zum Zitat Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016) Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016)
27.
Zurück zum Zitat Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD (2014) Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD (2014)
28.
Zurück zum Zitat Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015) Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015)
29.
Zurück zum Zitat Leis, V., Kundhikanjana, K., Kemper, A., Neumann, T.: Efficient processing of window functions in analytical SQL queries. PVLDB 8(10), 1058 (2015) Leis, V., Kundhikanjana, K., Kemper, A., Neumann, T.: Efficient processing of window functions in analytical SQL queries. PVLDB 8(10), 1058 (2015)
30.
Zurück zum Zitat Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017) Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017)
31.
Zurück zum Zitat Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE, pp. 26–35 (2007) Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE, pp. 26–35 (2007)
32.
Zurück zum Zitat Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In: SoCC, pp. 153–166 (2015) Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In: SoCC, pp. 153–166 (2015)
34.
Zurück zum Zitat Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for local queries. In: SIGMOD, pp. 84–95 (1986) Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for local queries. In: SIGMOD, pp. 84–95 (1986)
35.
Zurück zum Zitat Markl, V., Megiddo, N., Kutsch, M., Tran, T.M., Haas, P.J., Srivastava, U.: Consistently estimating the selectivity of conjuncts of predicates. In: VLDB, pp. 373–384 (2005) Markl, V., Megiddo, N., Kutsch, M., Tran, T.M., Haas, P.J., Srivastava, U.: Consistently estimating the selectivity of conjuncts of predicates. In: VLDB, pp. 373–384 (2005)
36.
Zurück zum Zitat Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: SIGMOD, pp. 539–552 (2008) Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: SIGMOD, pp. 539–552 (2008)
37.
Zurück zum Zitat Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB 2(1), 982–993 (2009) Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB 2(1), 982–993 (2009)
38.
Zurück zum Zitat Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: SIGMOD, pp. 1123–1136 (2015) Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: SIGMOD, pp. 1123–1136 (2015)
39.
Zurück zum Zitat Neumann, T.: Query simplification: graceful degradation for join-order optimization. In: SIGMOD, pp. 403–414 (2009) Neumann, T.: Query simplification: graceful degradation for join-order optimization. In: SIGMOD, pp. 403–414 (2009)
40.
Zurück zum Zitat Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: BTW, pp. 73–92 (2013) Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: BTW, pp. 73–92 (2013)
41.
Zurück zum Zitat O’Neil, P.E., O’Neil, E.J., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: TPCTC, pp. 237–252 (2009) O’Neil, P.E., O’Neil, E.J., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: TPCTC, pp. 237–252 (2009)
42.
Zurück zum Zitat Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB, pp. 486–495 (1997) Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB, pp. 486–495 (1997)
43.
Zurück zum Zitat Pöss, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: PVLDB, pp. 1138–1149 (2007) Pöss, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: PVLDB, pp. 1138–1149 (2007)
44.
Zurück zum Zitat Rusu, F., Dobra, A.: Sketches for size of join estimation. TODS 33(3), 15 (2008)CrossRef Rusu, F., Dobra, A.: Sketches for size of join estimation. TODS 33(3), 15 (2008)CrossRef
45.
Zurück zum Zitat Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979) Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)
46.
Zurück zum Zitat Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6(3), 191–208 (1997)CrossRef Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6(3), 191–208 (1997)CrossRef
47.
Zurück zum Zitat Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO—DB2’s learning optimizer. In: VLDB, pp. 19–28 (2001) Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO—DB2’s learning optimizer. In: VLDB, pp. 19–28 (2001)
48.
Zurück zum Zitat Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB 4(11), 852–863 (2011) Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB 4(11), 852–863 (2011)
49.
Zurück zum Zitat Waas, F., Pellenkoft, A.: Join order selection-good enough is easy. In: BNCOD, pp. 51–67 (2000) Waas, F., Pellenkoft, A.: Join order selection-good enough is easy. In: BNCOD, pp. 51–67 (2000)
50.
Zurück zum Zitat Waas, F.M., Giakoumakis, L., Zhang, S.: Plan space analysis: an early warning system to detect plan regressions in cost-based optimizers. In: DBTest (2011) Waas, F.M., Giakoumakis, L., Zhang, S.: Plan space analysis: an early warning system to detect plan regressions in cost-based optimizers. In: DBTest (2011)
51.
Zurück zum Zitat Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: ICDE, pp. 1081–1092 (2013) Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: ICDE, pp. 1081–1092 (2013)
52.
Zurück zum Zitat Wu, W., Naughton, J.F., Singh, H.: Sampling-based query re-optimization. In: SIGMOD (2016) Wu, W., Naughton, J.F., Singh, H.: Sampling-based query re-optimization. In: SIGMOD (2016)
53.
Zurück zum Zitat Yu, F., Hou, W., Luo, C., Che, D., Zhu, M.: CS2: a new database synopsis for query estimation. In: SIGMOD, pp. 469–480 (2013) Yu, F., Hou, W., Luo, C., Che, D., Zhu, M.: CS2: a new database synopsis for query estimation. In: SIGMOD, pp. 469–480 (2013)
Metadaten
Titel
Query optimization through the looking glass, and what we found running the Join Order Benchmark
verfasst von
Viktor Leis
Bernhard Radke
Andrey Gubichev
Atanas Mirchev
Peter Boncz
Alfons Kemper
Thomas Neumann
Publikationsdatum
18.09.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 5/2018
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-017-0480-7

Weitere Artikel der Ausgabe 5/2018

The VLDB Journal 5/2018 Zur Ausgabe