Skip to main content
Top
Published in: World Wide Web 5/2023

28-07-2023

Cardinality estimation with smoothing autoregressive models

Authors: Yuming Lin, Zejun Xu, Yinghao Zhang, You Li, Jingwei Zhang

Published in: World Wide Web | Issue 5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Cardinality estimation, which aims at accurately estimating the result size of queries, is a fundamental task in database query processing and optimization. One of the most recent and effective solutions to this problem is the use of deep autoregressive models to obtain joint probability distributions through unsupervised learning. However, due to the data sparsity, it is difficult for the estimator to accurately capture the actual distribution, which affects the accuracy of the cardinality estimation. In addition, autoregressive estimators’ progressive sampling characteristics are prone to error propagation, which is more evident in high-dimensional data. To reduce the autoregressive cardinality estimation error and to obtain a better trade-off between estimate accuracy and latency, we propose a random smoothing autoregressive cardinality estimation model (SAM-CE), which uses a random smoothing technique combined with a deep autoregressive model to simplify the learning of joint probability distributions. A smooth progressive sampling method that is suitable for range queries is designed to improve the estimator accuracy by improving the sample quality. We conduct extensive experiments to demonstrate the effectiveness and performance of the proposed SAM-CE. The results show that SAM-CE achieves the state of the art effectiveness of cardinality estimation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
catalog.data.gov/dataset/vehicle-snowmobile-and-boat-registrations.
 
Literature
2.
go back to reference Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endow. 9(3), 204–215 (2015)CrossRef Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endow. 9(3), 204–215 (2015)CrossRef
4.
go back to reference Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., Chaudhuri, S.: Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment 12(9), 1044–1057 (2019)CrossRef Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., Chaudhuri, S.: Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment 12(9), 1044–1057 (2019)CrossRef
6.
go back to reference Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: A multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 211–222 (2001) Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: A multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 211–222 (2001)
8.
go back to reference Flajolet, P., Fusy, E., Gandouet, O., Meunier, F.: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Discrete Mathematics and Theoretical Computer Science (2007) Flajolet, P., Fusy, E., Gandouet, O., Meunier, F.: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Discrete Mathematics and Theoretical Computer Science (2007)
11.
go back to reference Lipton, R.J., Naughton, J.F., Schneider, D.A.: Practical selectivity estimation through adaptive sampling. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 1–11 (2022) Lipton, R.J., Naughton, J.F., Schneider, D.A.: Practical selectivity estimation through adaptive sampling. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 1–11 (2022)
12.
go back to reference Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pp. 486–495. http://www.vldb.org/conf/1997/P486.PDF Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pp. 486–495. http://​www.​vldb.​org/​conf/​1997/​P486.​PDF
13.
go back to reference Heimel, M., Kiefer, M., Markl, V.: Self-tuning, gpu-accelerated kernel density models or multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492 Heimel, M., Kiefer, M., Markl, V.: Self-tuning, gpu-accelerated kernel density models or multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492
14.
go back to reference Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: Deepdb: Learn from data, not from queries! Proc. VLDB Endow. 13(7), 992–1005 (2020)CrossRef Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: Deepdb: Learn from data, not from queries! Proc. VLDB Endow. 13(7), 992–1005 (2020)CrossRef
15.
go back to reference Wu, P., Cong, G.: A unified deep model of learning from both data and queries for cardinality estimation. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2009–2022 (2021) Wu, P., Cong, G.: A unified deep model of learning from both data and queries for cardinality estimation. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2009–2022 (2021)
17.
go back to reference Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1035–1050 (2020) Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1035–1050 (2020)
19.
go back to reference To, H., Chiang, K., Shahabi, C.: Entropy-based histograms for selectivity estimation. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1939–1948 (2013) To, H., Chiang, K., Shahabi, C.: Entropy-based histograms for selectivity estimation. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1939–1948 (2013)
20.
go back to reference Lynch, C.A.: Selectivity estimation and query optimization in large databases with highly skewed distribution of column values. In: Proceedings of the 14th International Conference on Very Large Data Bases, pp. 240–251 (1998) Lynch, C.A.: Selectivity estimation and query optimization in large databases with highly skewed distribution of column values. In: Proceedings of the 14th International Conference on Very Large Data Bases, pp. 240–251 (1998)
21.
go back to reference Park, Y., Zhong, S., Mozafari, B.: Quicksel: Quick selectivity learning with mixture models. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1017–1033 (2020) Park, Y., Zhong, S., Mozafari, B.: Quicksel: Quick selectivity learning with mixture models. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1017–1033 (2020)
22.
go back to reference Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: VLDB, vol. 1, pp. 541–550 (2001) Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: VLDB, vol. 1, pp. 541–550 (2001)
23.
go back to reference Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: VLDB, vol. 95, pp. 311–322 (1995) Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: VLDB, vol. 95, pp. 311–322 (1995)
25.
go back to reference Spiegel, J., Polyzotis, N.: Graph-based synopses for relational selectivity estimation. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 205–216 (2006) Spiegel, J., Polyzotis, N.: Graph-based synopses for relational selectivity estimation. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 205–216 (2006)
26.
go back to reference Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 461–472 (2001) Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 461–472 (2001)
27.
go back to reference Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA, pp. 463–474 (2000). https://doi.org/10.1145/342009.335448 Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA, pp. 463–474 (2000). https://​doi.​org/​10.​1145/​342009.​335448
28.
go back to reference Lakshmi, M.S., Zhou, S.: Selectivity estimation in extensible databases - A neural network approach. In: Gupta, A., Shmueli, O., Widom, J. (eds.) VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA, pp.623–627. http://www.vldb.org/conf/1998/p623.pdf Lakshmi, M.S., Zhou, S.: Selectivity estimation in extensible databases - A neural network approach. In: Gupta, A., Shmueli, O., Widom, J. (eds.) VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA, pp.623–627. http://​www.​vldb.​org/​conf/​1998/​p623.​pdf
29.
go back to reference Liu, H., Xu, M., Yu, Z., Corvinelli, V., Zuzarte, C.: Cardinality estimation using neural networks. In: Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, pp. 53–59 Liu, H., Xu, M., Yu, Z., Corvinelli, V., Zuzarte, C.: Cardinality estimation using neural networks. In: Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, pp. 53–59
31.
go back to reference Ortiz, J., Balazinska, M., Gehrke, J., Keerthi, S.S.: An empirical analysis of deep learning for cardinality estimation. CoRR abs/1905.06425 (2019). arXiv:1905.06425 Ortiz, J., Balazinska, M., Gehrke, J., Keerthi, S.S.: An empirical analysis of deep learning for cardinality estimation. CoRR abs/1905.06425 (2019). arXiv:​1905.​06425
32.
go back to reference Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., Zhou, J., Cui, B.: Flat: Fast, lightweight and accurate method for cardinality estimation. Proc. VLDB Endow. 14(9), 1489–1502 (2021)CrossRef Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., Zhou, J., Cui, B.: Flat: Fast, lightweight and accurate method for cardinality estimation. Proc. VLDB Endow. 14(9), 1489–1502 (2021)CrossRef
33.
go back to reference Narayanan, H., Mitter, S.K.: Sample complexity of testing the manifold hypothesis. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a Meeting Held 6-9 December 2010, pp. 1786–1794. Vancouver, British Columbia, Canada (2010) Narayanan, H., Mitter, S.K.: Sample complexity of testing the manifold hypothesis. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a Meeting Held 6-9 December 2010, pp. 1786–1794. Vancouver, British Columbia, Canada (2010)
34.
go back to reference Cornish, R., Caterini, A., Deligiannidis, G., Doucet, A.: Relaxing bijectivity constraints with continuously indexed normalising flows. In: International Conference on Machine Learning, pp. 2133–2143. PMLR (2020) Cornish, R., Caterini, A., Deligiannidis, G., Doucet, A.: Relaxing bijectivity constraints with continuously indexed normalising flows. In: International Conference on Machine Learning, pp. 2133–2143. PMLR (2020)
36.
go back to reference Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via randomized smoothing. In: International Conference on Machine Learning, pp. 1310–1320. PMLR Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via randomized smoothing. In: International Conference on Machine Learning, pp. 1310–1320. PMLR
37.
go back to reference Wang, X., Qu, C., Wu, W., Wang, J., Zhou, Q.: Are we ready for learned cardinality estimation? Proc. VLDB Endow. 14(9), 1640–1654 (2021)CrossRef Wang, X., Qu, C., Wu, W., Wang, J., Zhou, Q.: Are we ready for learned cardinality estimation? Proc. VLDB Endow. 14(9), 1640–1654 (2021)CrossRef
38.
go back to reference Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: Jagadish, H.V., Mumick, I.S. (eds.) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996, pp. 294–305 (1996). https://doi.org/10.1145/233269.233342 Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: Jagadish, H.V., Mumick, I.S. (eds.) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996, pp. 294–305 (1996). https://​doi.​org/​10.​1145/​233269.​233342
40.
go back to reference Germain, M., Gregor, K., Murray, I., Larochelle, H.: MADE: masked autoencoder for distribution estimation. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 881–889 (2015). http://proceedings.mlr.press/v37/germain15.html Germain, M., Gregor, K., Murray, I., Larochelle, H.: MADE: masked autoencoder for distribution estimation. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 881–889 (2015). http://​proceedings.​mlr.​press/​v37/​germain15.​html
Metadata
Title
Cardinality estimation with smoothing autoregressive models
Authors
Yuming Lin
Zejun Xu
Yinghao Zhang
You Li
Jingwei Zhang
Publication date
28-07-2023
Publisher
Springer US
Published in
World Wide Web / Issue 5/2023
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-023-01195-7

Other articles of this Issue 5/2023

World Wide Web 5/2023 Go to the issue

Premium Partner