research-article

Neo: a learned query optimizer

Authors:
Ryan Marcus

Brandeis University

Brandeis University
View Profile

,
Parimarjan Negi

MIT

MIT
View Profile

,
Hongzi Mao

MIT

MIT
View Profile

,
Chi Zhang

Brandeis University

Brandeis University
View Profile

,
Mohammad Alizadeh

MIT

MIT
View Profile

,
Tim Kraska

MIT

MIT
View Profile

,
Olga Papaemmanouil

Brandeis University

Brandeis University
View Profile

,
Nesime Tatbul

MIT and Intel Labs

MIT and Intel Labs
View Profile

Proceedings of the VLDB Endowment Volume 12 Issue 11pp 1705–1718https://doi.org/10.14778/3342263.3342644

Published:01 July 2019Publication History

Proceedings of the VLDB Endowment

Abstract

Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them.

References

Embedding tools, https://github.com/parimarjan/db-embedding-tools.Google Scholar
Ext-JOB queries, https://git.io/extended_job.Google Scholar
PostgreSQL database, http://www.postgresql.org/.Google Scholar
T. Anthony, Z. Tian, and D. Barber. Thinking Fast and Slow with Deep Learning and Tree Search. In Advances in Neural Information Processing Systems 30, NIPS '17, pages 5366--5376, 2017. Google ScholarDigital Library
J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer Normalization. arXiv:1607.06450 {cs, stat}, July 2016.Google Scholar
B. Babcock and S. Chaudhuri. Towards a Robust Query Optimizer: A Principled and Practical Approach. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD '05, pages 119--130, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
R. Bellman. A Markovian Decision Process. Indiana University Mathematics Journal, 6(4):679--684, 1957.Google ScholarCross Ref
Y. Bengio. Deep Learning of Representations for Unsupervised and Transfer Learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, ICML WUTL '12, pages 17--36, June 2012. Google ScholarDigital Library
R. Bordawekar and O. Shmueli. Using Word Embedding to Enable Semantic Queries in Relational Databases. In Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning (DEEM), DEEM '17, pages 5:1--5:4, 2017. Google ScholarDigital Library
P. P. Brahma, D. Wu, and Y. She. Why Deep Learning Works: A Manifold Disentanglement Perspective. IEEE Transactions on Neural Networks and Learning Systems, 27(10):1997--2008, Oct. 2016.Google ScholarCross Ref
S. Chaudhuri. An Overview of Query Optimization in Relational Systems. In ACM SIGMOD Symposium on Principles of Database Systems, SIGMOD '98, pages 34--43, 1998. Google ScholarDigital Library
R. Dechter and J. Pearl. Generalized Best-first Search Strategies and the Optimality of A*. J. ACM, 32(3):505--536, July 1985. Google ScholarDigital Library
M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. H. Hovy, and N. A. Smith. Retrofitting Word Vectors to Semantic Lexicons. In The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL '15, pages 1606--1615, 2015.Google ScholarCross Ref
R. C. Fernandez and S. Madden. Termite: A System for Tunneling Through Heterogeneous Data. In AIDM @ SIGMOD 2019, aiDM '19, 2019. Google ScholarDigital Library
L. Giakoumakis and C. A. Galindo-Legaria. Testing SQL Server's Query Optimizer: Challenges, Techniques and Experiences. IEEE Data Eng. Bull., 31:36--43, 2008.Google Scholar
X. Glorot, A. Bordes, and Y. Bengio. Deep Sparse Rectifier Neural Networks. In G. Gordon, D. Dunson, and M. Dudík, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of PMLR '11, pages 315--323, Fort Lauderdale, FL, USA, Apr. 2011. PMLR.Google Scholar
A. Heidari, J. McGrath, I. F. Ilyas, and T. Rekatsinas. HoloDetect: Few-Shot Learning for Error Detection. arXiv:1904.02285 {cs}, Apr. 2019. Google ScholarDigital Library
T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys. Deep Q-learning from Demonstrations. In Thirty-Second AAAI Conference on Artifical Intelligence, AAAI '18, New Orleans, Apr. 2017. IEEE.Google Scholar
I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 647--658, 2004. Google ScholarDigital Library
T. Kaftan, M. Balazinska, A. Cheung, and J. Gehrke. Cuttlefish: A Lightweight Primitive for Adaptive Query Processing. arXiv preprint, Feb. 2018.Google Scholar
D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In 3rd International Conference for Learning Representations, ICLR '15, San Diego, CA, 2015.Google Scholar
A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google Scholar
T. Kraska, M. Alizadeh, A. Beutel, Ed Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google Scholar
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 489--504, New York, NY, USA, 2018. ACM. Google ScholarDigital Library
S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 {cs}, Aug. 2018.Google Scholar
V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3):204--215, 2015. Google ScholarDigital Library
V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the Join Order Benchmark. The VLDB Journal, pages 1--26, Sept. 2017. Google ScholarDigital Library
H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, CASCON '15, pages 53--59, Riverton, NJ, USA, 2015. IBM Corp.Google ScholarDigital Library
W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi. A survey of deep neural network architectures and their applications. Neurocomputing, 234:11--26, Apr. 2017.Google ScholarCross Ref
G. Lohman. Is Query Optimization a '"Solved" Problem? In ACM SIGMOD Blog, ACM Blog '14, 2014.Google Scholar
J. Long, E. Shelhamer, and T. Darrell. Fully Convolutional Networks for Semantic Segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR '15, June 2015.Google Scholar
H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh. Learning Scheduling Algorithms for Data Processing Clusters. arXiv:1810.01963 {cs, stat}, 2018.Google Scholar
G. Marcus. Innateness, AlphaZero, and Artificial Intelligence. arXiv:1801.05667 {cs}, Jan. 2018.Google Scholar
R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: Towards A Learned Query Optimizer. arXiv:1904.03711 {cs}, Apr. 2019.Google Scholar
R. Marcus and O. Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '18, Houston, TX, 2018. Google ScholarDigital Library
R. Marcus and O. Papaemmanouil. Towards a Hands-Free Query Optimizer through Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient Estimation of Word Representations in Vector Space. arXiv:13013781 {cs}, Jan. 2013.Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Ostrovski. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 2015.Google ScholarCross Ref
G. Moerkotte, T. Neumann, and G. Steidl. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB, 2(1):982--993, 2009. Google ScholarDigital Library
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI '16, pages 1287--1293, Phoenix, Arizona, 2016. AAAI Press. Google ScholarDigital Library
S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep Learning for Entity Matching: A Design Space Exploration. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 19--34, New York, NY, USA, 2018. ACM. Google ScholarDigital Library
B. Nevarez. Inside the SQL Server Query Optimizer. Red Gate books, Mar. 2011. Google ScholarDigital Library
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In 2nd Workshop on Data Managmeent for End-to-End Machine Learning, DEEM '18, 2018. Google ScholarDigital Library
Y. Park, S. Zhong, and B. Mozafari. QuickSel: Quick Selectivity Learning with Mixture Models. arXiv:1812.10568 {cs}, Dec. 2018.Google Scholar
M. Poess and C. Floyd. New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Records, 29(4):64--71, Dec. 2000. Google ScholarDigital Library
A. G. Read. DeWitt clauses: Can we protect purchasers without hurting Microsoft. Rev. Litig., 25:387, 2006.Google Scholar
R. Rehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, LREC '10, pages 45--50. ELRA, May 2010.Google Scholar
S. Schaal. Learning from Demonstration. In Proceedings of the 9th International Conference on Neural Information Processing Systems, NIPS'96, pages 1040--1046, Cambridge, MA, USA, 1996. MIT Press. Google ScholarDigital Library
M. Schaarschmidt, A. Kuhnle, B. Ellis, K. Fricke, F. Gessert, and E. Yoneki. LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations. arXiv:1808.07903 {cs, stat}, Aug. 2018.Google Scholar
J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85--117, Jan. 2015. Google ScholarDigital Library
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In J. Mylopolous and M. Brodie, editors, SIGMOD '89, SIGMOD '89, pages 511--522, San Francisco (CA), 1989. Morgan Kaufmann. Google ScholarDigital Library
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484--489, Jan. 2016.Google ScholarCross Ref
M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, VLDB '01, pages 19--28, 2001. Google ScholarDigital Library
R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. Google ScholarDigital Library
N. Tran, A. Lamb, L. Shrinivas, S. Bodagala, and J. Dave. The Vertica Query Optimizer: The case for specialized query optimizers. In 2014 IEEE 30th International Conference on Data Engineering, ICDE '14, pages 1108--1119, Mar. 2014.Google ScholarCross Ref
I. Trummer, S. Moseley, D. Maram, S. Jo, and J. Antonakakis. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. PVLDB, 11(12):2074--2077, 2018. Google ScholarDigital Library
K. Tzoumas, T. Sellis, and C. Jensen. A Reinforcement Learning Approach for Adaptive Query Processing. Technical Reports, June 2008.Google Scholar
L. van der Maaten and G. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579--2605, 2008.Google Scholar
F. Waas and A. Pellenkoft. Join Order Selection (Good Enough Is Easy). In Advances in Databases, BNCD '00, pages 51--67. Springer, Berlin, Heidelberg, July 2000. Google ScholarDigital Library
W. Wang, M. Zhang, G. Chen, H. V. Jagadish, B. C. Ooi, and K.-L. Tan. Database Meets Deep Learning: Challenges and Opportunities. SIGMOD Rec., 45(2):17--22, Sept. 2016. Google ScholarDigital Library
L. Yu, J. Wang, K. R. Lai, and X. Zhang. Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3):671--681, Mar. 2018. Google ScholarDigital Library
J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward Multimodal Image-to-Image Translation. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, NIPS '17, pages 465--476. Curran Associates, Inc., 2017. Google ScholarDigital Library
S. Zilberstein. Using Anytime Algorithms in Intelligent Systems. AI Magazine, 17(3):73--73, Mar. 1996.Google Scholar

Index Terms

Neo: a learned query optimizer
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
  2. Information retrieval
    1. Information retrieval query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

NEO 2016: Results of the Numerical and Evolutionary Optimization Workshop NEO 2016 and the NEO Cities 2016 Workshop held on September 20-24, 2016 in ...
Read More
NEO 2015: Results of the Numerical and Evolutionary Optimization Workshop NEO 2015 held at September 23-25 2015 in Tijuana, Mexico
Read More
Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration–exploitation balance with bio-inspired neural networks
Abstract
Recent theoretical and experimental works have connected Hebbian plasticity with the reinforcement learning (RL) paradigm, producing a class of trial-and-error learning in artificial neural networks known as neo-Hebbian plasticity. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 12, Issue 11
July 2019
543 pages
ISSN:2150-8097
Editors:
Lei Chen,
Fatma Özcan
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2019
Published in pvldb Volume 12, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 117
  Total Citations
  View Citations
- 756
  Total Downloads
- Downloads (Last 12 months)144
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Neo: a learned query optimizer

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

NEO 2016: Results of the Numerical and Evolutionary Optimization Workshop NEO 2016 and the NEO Cities 2016 Workshop held on September 20-24, 2016 in ...

NEO 2015: Results of the Numerical and Evolutionary Optimization Workshop NEO 2015 held at September 23-25 2015 in Tijuana, Mexico

Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration–exploitation balance with bio-inspired neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Neo: a learned query optimizer

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

NEO 2016: Results of the Numerical and Evolutionary Optimization Workshop NEO 2016 and the NEO Cities 2016 Workshop held on September 20-24, 2016 in ...

NEO 2015: Results of the Numerical and Evolutionary Optimization Workshop NEO 2015 held at September 23-25 2015 in Tijuana, Mexico

Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration–exploitation balance with bio-inspired neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media