Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

An Approach Based on Bayesian Networks for Query Selectivity Estimation

verfasst von : Max Halford, Philippe Saint-Pierre, Franck Morvan

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The efficiency of a query execution plan depends on the accuracy of the selectivity estimates given to the query optimiser by the cost model. The cost model makes simplifying assumptions in order to produce said estimates in a timely manner. These assumptions lead to selectivity estimation errors that have dramatic effects on the quality of the resulting query execution plans. A convenient assumption that is ubiquitous among current cost models is to assume that attributes are independent with each other. However, it ignores potential correlations which can have a huge negative impact on the accuracy of the cost model. In this paper we attempt to relax the attribute value independence assumption without unreasonably deteriorating the accuracy of the cost model. We propose a novel approach based on a particular type of Bayesian networks called Chow-Liu trees to approximate the distribution of attribute values inside each relation of a database. Our results on the TPC-DS benchmark show that our method is an order of magnitude more precise than other approaches whilst remaining reasonably efficient in terms of time and space.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Specifically we used the following queries: 7, 13, 18, 26, 27, 53, 54, 91.
 
Literatur
1.
Zurück zum Zitat Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. ACM SIGMOD Rec. 28, 275–286 (1999)CrossRef Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. ACM SIGMOD Rec. 28, 275–286 (1999)CrossRef
2.
Zurück zum Zitat Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015) Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)
3.
Zurück zum Zitat Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. ACM SIGMOD Rec. 30, 211–222 (2001)CrossRef Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. ACM SIGMOD Rec. 30, 211–222 (2001)CrossRef
4.
Zurück zum Zitat Chaudhuri, S., Motwani, R., Narasayya, V.: On random sampling over joins. ACM SIGMOD Rec. 28, 263–274 (1999)CrossRef Chaudhuri, S., Motwani, R., Narasayya, V.: On random sampling over joins. ACM SIGMOD Rec. 28, 263–274 (1999)CrossRef
5.
Zurück zum Zitat Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback, vol. 23. ACM (1994) Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback, vol. 23. ACM (1994)
6.
Zurück zum Zitat Chen, Y., Yi, K.: Two-level sampling for join size estimation. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 759–774. ACM (2017) Chen, Y., Yi, K.: Two-level sampling for join size estimation. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 759–774. ACM (2017)
7.
Zurück zum Zitat Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)MathSciNetCrossRef Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)MathSciNetCrossRef
8.
Zurück zum Zitat Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990)MathSciNetCrossRef Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990)MathSciNetCrossRef
9.
Zurück zum Zitat Cowell, R.G., Dawid, P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer Science & Business Media (2006) Cowell, R.G., Dawid, P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer Science & Business Media (2006)
10.
Zurück zum Zitat Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. ACM SIGMOD Rec. 30, 461–472 (2001)CrossRef Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. ACM SIGMOD Rec. 30, 461–472 (2001)CrossRef
11.
Zurück zum Zitat Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)MATH Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)MATH
12.
Zurück zum Zitat Heimel, M., Kiefer, M., Markl, V.: Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492. ACM (2015) Heimel, M., Kiefer, M., Markl, V.: Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492. ACM (2015)
14.
Zurück zum Zitat Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem, vol. 53. Elsevier, Amsterdam (1992)MATH Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem, vol. 53. Elsevier, Amsterdam (1992)MATH
15.
Zurück zum Zitat Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results, vol. 20. ACM (1991) Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results, vol. 20. ACM (1991)
16.
Zurück zum Zitat Ioannidis, Y.E., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. (TODS) 18(4), 709–748 (1993)CrossRef Ioannidis, Y.E., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. (TODS) 18(4), 709–748 (1993)CrossRef
17.
Zurück zum Zitat Jaakkola, T., Sontag, D., Globerson, A., Meila, M.: Learning Bayesian network structure using LP relaxations. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 358–365 (2010) Jaakkola, T., Sontag, D., Globerson, A., Meila, M.: Learning Bayesian network structure using LP relaxations. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 358–365 (2010)
18.
Zurück zum Zitat Jensen, F.V.: An Introduction to Bayesian Networks, vol. 210. UCL Press, London (1996) Jensen, F.V.: An Introduction to Bayesian Networks, vol. 210. UCL Press, London (1996)
19.
Zurück zum Zitat Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018) Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. arXiv preprint arXiv:​1809.​00677 (2018)
20.
Zurück zum Zitat Kooi, R.P.: The optimization of queries in relational databases (1980) Kooi, R.P.: The optimization of queries in relational databases (1980)
21.
Zurück zum Zitat Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endowment 9(3), 204–215 (2015)CrossRef Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endowment 9(3), 204–215 (2015)CrossRef
22.
Zurück zum Zitat Leis, V., et al.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB J. 27, 1–26 (2018)CrossRef Leis, V., et al.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB J. 27, 1–26 (2018)CrossRef
23.
Zurück zum Zitat Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017) Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017)
24.
Zurück zum Zitat Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: Proceedings of the 2016 International Conference on Management of Data, pp. 615–629. ACM (2016) Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: Proceedings of the 2016 International Conference on Management of Data, pp. 615–629. ACM (2016)
25.
Zurück zum Zitat Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 40–46. ACM (1990) Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 40–46. ACM (1990)
26.
Zurück zum Zitat Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endowment 2(1), 982–993 (2009)CrossRef Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endowment 2(1), 982–993 (2009)CrossRef
27.
Zurück zum Zitat Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. SIGMOD Rec. 17, 28–36 (1988)CrossRef Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. SIGMOD Rec. 17, 28–36 (1988)CrossRef
29.
Zurück zum Zitat Olken, F.: Random sampling from databases. Ph.D. thesis, University of California, Berkeley (1993) Olken, F.: Random sampling from databases. Ph.D. thesis, University of California, Berkeley (1993)
30.
Zurück zum Zitat Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. ACM SIGMOD Rec. 14(2), 256–276 (1984)CrossRef Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. ACM SIGMOD Rec. 14(2), 256–276 (1984)CrossRef
31.
Zurück zum Zitat Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 1138–1149. VLDB Endowment (2007) Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 1138–1149. VLDB Endowment (2007)
32.
Zurück zum Zitat Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. ACM Sigmod Rec. 25, 294–305 (1996)CrossRef Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. ACM Sigmod Rec. 25, 294–305 (1996)CrossRef
33.
Zurück zum Zitat Robertson, N., Seymour, P.D.: Graph minors. II: algorithmic aspects of tree-width. J. Algorithms 7(3), 309–322 (1986)MathSciNetCrossRef Robertson, N., Seymour, P.D.: Graph minors. II: algorithmic aspects of tree-width. J. Algorithms 7(3), 309–322 (1986)MathSciNetCrossRef
34.
Zurück zum Zitat Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM (1979) Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM (1979)
35.
Zurück zum Zitat Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: Leo-db2’s learning optimizer. VLDB 1, 19–28 (2001) Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: Leo-db2’s learning optimizer. VLDB 1, 19–28 (2001)
36.
Zurück zum Zitat Traverso, M.: Presto: interacting with petabytes of data at Facebook. Retrieved February 4, 2014 (2013) Traverso, M.: Presto: interacting with petabytes of data at Facebook. Retrieved February 4, 2014 (2013)
37.
Zurück zum Zitat Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. Proc. VLDB Endowment 4(11), 852–863 (2011) Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. Proc. VLDB Endowment 4(11), 852–863 (2011)
38.
Zurück zum Zitat Vengerov, D., Menck, A.C., Zait, M., Chakkappen, S.P.: Join size estimation subject to filter conditions. Proc. VLDB Endowment 8(12), 1530–1541 (2015)CrossRef Vengerov, D., Menck, A.C., Zait, M., Chakkappen, S.P.: Join size estimation subject to filter conditions. Proc. VLDB Endowment 8(12), 1530–1541 (2015)CrossRef
Metadaten
Titel
An Approach Based on Bayesian Networks for Query Selectivity Estimation
verfasst von
Max Halford
Philippe Saint-Pierre
Franck Morvan
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-18579-4_1