Abstract
Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. Finally, we investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the sub-optimal cardinality estimates.
- R. Ahmed, R. Sen, M. Poess, and S. Chakkappen. Of snowstorms and bushy trees. PVLDB, 7(13):1452--1461, 2014. Google ScholarDigital Library
- B. Babcock and S. Chaudhuri. Towards a robust query optimizer: A principled and practical approach. In SIGMOD, pages 119--130, 2005. Google ScholarDigital Library
- S. Bellamkonda, H.-G. Li, U. Jagtap, Y. Zhu, V. Liang, and T. Cruanes. Adaptive and big data scale parallel execution in Oracle. PVLDB, 6(11):1102--1113, 2013. Google ScholarDigital Library
- R. Borovica-Gajic, S. Idreos, A. Ailamaki, M. Zukowski, and C. Fraser. Smooth scan: Statistics-oblivious access paths. In ICDE, pages 315--326, 2015.Google ScholarCross Ref
- N. Bruno, C. A. Galindo-Legaria, and M. Joshi. Polynomial heuristics for query optimization. In ICDE, pages 589--600, 2010.Google ScholarCross Ref
- S. Chaudhuri. Query optimizers: time to rethink the contract? In SIGMOD, pages 961--968, 2009. Google ScholarDigital Library
- S. Chaudhuri, V. R. Narasayya, and R. Ramamurthy. Exact cardinality query optimization for optimizer testing. PVLDB, 2(1):994--1005, 2009. Google ScholarDigital Library
- M. Colgan. Oracle adaptive joins. https://blogs.oracle.com/optimizer/entry/what_s_new_in_12c, 2013.Google Scholar
- A. Dutt and J. R. Haritsa. Plan bouquets: query processing without selectivity estimation. In SIGMOD, pages 1039--1050, 2014. Google ScholarDigital Library
- C. Estan and J. F. Naughton. End-biased samples for join cardinality estimation. In ICDE, page 20, 2006. Google ScholarDigital Library
- L. Fegaras. A new heuristic for optimizing large queries. In DEXA, pages 726--735, 1998. Google ScholarDigital Library
- P. Fender and G. Moerkotte. Counter strike: Generic top-down join enumeration for hypergraphs. PVLDB, 6(14):1822--1833, 2013. Google ScholarDigital Library
- P. Fender, G. Moerkotte, T. Neumann, and V. Leis. Effective and robust pruning for top-down join enumeration algorithms. In ICDE, pages 414--425, 2012. Google ScholarDigital Library
- C. Fraser, L. Giakoumakis, V. Hamine, and K. F. Moore-Smith. Testing cardinality estimation models in SQL Server. In DBtest, 2012. Google ScholarDigital Library
- G. Graefe. A generalized join algorithm. In BTW, pages 267--286, 2011.Google Scholar
- Z. Gu, M. A. Soliman, and F. M. Waas. Testing the accuracy of query optimizers. In DBTest, 2012. Google ScholarDigital Library
- P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on random sampling. J Computer System Science, 52(3):550--569, 1996. Google ScholarDigital Library
- J. R. Haritsa. The Picasso database query optimizer visualizer. PVLDB, 3(2):1517--1520, 2010. Google ScholarDigital Library
- I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. CORDS: automatic discovery of correlations and soft functional dependencies. In SIGMOD, pages 647--658, 2004. Google ScholarDigital Library
- Y. E. Ioannidis. The history of histograms (abridged). In VLDB, pages 19--30, 2003. Google ScholarDigital Library
- Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991. Google ScholarDigital Library
- R. A. Kader, P. A. Boncz, S. Manegold, and M. van Keulen. ROX: run-time optimization of XQueries. In SIGMOD, pages 615--626, 2009. Google ScholarDigital Library
- R. Kaushik, C. Ré, and D. Suciu. General database statistics using entropy maximization. In DBPL, pages 84--99, 2009. Google ScholarDigital Library
- Q. Li, M. Shao, V. Markl, K. S. Beyer, L. S. Colby, and G. M. Lohman. Adaptively reordering joins during query execution. In ICDE, pages 26--35, 2007.Google ScholarCross Ref
- F. Liu and S. Blanas. Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In SoCC, pages 153--166, 2015. Google ScholarDigital Library
- G. Lohman. Is query optimization a solved problem? http://wp.sigmod.org/?p=1075, 2014.Google Scholar
- L. F. Mackert and G. M. Lohman. R* optimizer validation and performance evaluation for local queries. In SIGMOD, pages 84--95, 1986. Google ScholarDigital Library
- V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, pages 373--384, 2005. Google ScholarDigital Library
- G. Moerkotte and T. Neumann. Dynamic programming strikes back. In SIGMOD, pages 539--552, 2008. Google ScholarDigital Library
- G. Moerkotte, T. Neumann, and G. Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB, 2(1):982--993, 2009. Google ScholarDigital Library
- I. Müller, P. Sanders, A. Lacurie, W. Lehner, and F. Färber. Cache-efficient aggregation: Hashing is sorting. In SIGMOD, pages 1123--1136, 2015. Google ScholarDigital Library
- T. Neumann. Query simplification: graceful degradation for join-order optimization. In SIGMOD, pages 403--414, 2009. Google ScholarDigital Library
- T. Neumann and C. A. Galindo-Legaria. Taking the edge off cardinality estimation errors using incremental execution. In BTW, pages 73--92, 2013.Google Scholar
- V. Poosala and Y. E. Ioannidis. Selectivity estimation without the attribute value independence assumption. In VLDB, pages 486--495, 1997. Google ScholarDigital Library
- F. Rusu and A. Dobra. Sketches for size of join estimation. TODS, 33(3), 2008. Google ScholarDigital Library
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD, pages 23--34, 1979. Google ScholarDigital Library
- M. Steinbrunn, G. Moerkotte, and A. Kemper. Heuristic and randomized optimization for the join ordering problem. VLDB J., 6(3):191--208, 1997. Google ScholarDigital Library
- M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's learning optimizer. In VLDB, pages 19--28, 2001. Google ScholarDigital Library
- K. Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB, 4(11):852--863, 2011.Google ScholarDigital Library
- F. Waas and A. Pellenkoft. Join order selection - good enough is easy. In BNCOD, pages 51--67, 2000. Google ScholarDigital Library
- F. M. Waas, L. Giakoumakis, and S. Zhang. Plan space analysis: an early warning system to detect plan regressions in cost-based optimizers. In DBTest, 2011. Google ScholarDigital Library
- W. Wu, Y. Chi, S. Zhu, J. Tatemura, H. Hacigümüs, and J. F. Naughton. Predicting query execution time: Are optimizer cost models really unusable? In ICDE, pages 1081--1092, 2013. Google ScholarDigital Library
- F. Yu, W. Hou, C. Luo, D. Che, and M. Zhu. CS2: a new database synopsis for query estimation. In SIGMOD, pages 469--480, 2013. Google ScholarDigital Library
Index Terms
- How good are query optimizers, really?
Recommendations
View-based query containment
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsQuery containment is the problem of checking whether for all databases the answer to a query is a subset of the answer to a second query. In several data management tasks, such as data integration, mobile computing, etc., the data of interest are only ...
Query Folding
ICDE '96: Proceedings of the Twelfth International Conference on Data EngineeringQuery folding refers to the activity of determining if and how a query can be answered using a given set of resources, which might be materialized views, cached results of previous queries, or queries answerable by other databases. We investigate query ...
Comments