skip to main content
research-article

Neo: a learned query optimizer

Published:01 July 2019Publication History
Skip Abstract Section

Abstract

Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them.

References

  1. Embedding tools, https://github.com/parimarjan/db-embedding-tools.Google ScholarGoogle Scholar
  2. Ext-JOB queries, https://git.io/extended_job.Google ScholarGoogle Scholar
  3. PostgreSQL database, http://www.postgresql.org/.Google ScholarGoogle Scholar
  4. T. Anthony, Z. Tian, and D. Barber. Thinking Fast and Slow with Deep Learning and Tree Search. In Advances in Neural Information Processing Systems 30, NIPS '17, pages 5366--5376, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer Normalization. arXiv:1607.06450 {cs, stat}, July 2016.Google ScholarGoogle Scholar
  6. B. Babcock and S. Chaudhuri. Towards a Robust Query Optimizer: A Principled and Practical Approach. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD '05, pages 119--130, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Bellman. A Markovian Decision Process. Indiana University Mathematics Journal, 6(4):679--684, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  8. Y. Bengio. Deep Learning of Representations for Unsupervised and Transfer Learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, ICML WUTL '12, pages 17--36, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Bordawekar and O. Shmueli. Using Word Embedding to Enable Semantic Queries in Relational Databases. In Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning (DEEM), DEEM '17, pages 5:1--5:4, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. P. Brahma, D. Wu, and Y. She. Why Deep Learning Works: A Manifold Disentanglement Perspective. IEEE Transactions on Neural Networks and Learning Systems, 27(10):1997--2008, Oct. 2016.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Chaudhuri. An Overview of Query Optimization in Relational Systems. In ACM SIGMOD Symposium on Principles of Database Systems, SIGMOD '98, pages 34--43, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Dechter and J. Pearl. Generalized Best-first Search Strategies and the Optimality of A*. J. ACM, 32(3):505--536, July 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. H. Hovy, and N. A. Smith. Retrofitting Word Vectors to Semantic Lexicons. In The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL '15, pages 1606--1615, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. C. Fernandez and S. Madden. Termite: A System for Tunneling Through Heterogeneous Data. In AIDM @ SIGMOD 2019, aiDM '19, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Giakoumakis and C. A. Galindo-Legaria. Testing SQL Server's Query Optimizer: Challenges, Techniques and Experiences. IEEE Data Eng. Bull., 31:36--43, 2008.Google ScholarGoogle Scholar
  16. X. Glorot, A. Bordes, and Y. Bengio. Deep Sparse Rectifier Neural Networks. In G. Gordon, D. Dunson, and M. Dudík, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of PMLR '11, pages 315--323, Fort Lauderdale, FL, USA, Apr. 2011. PMLR.Google ScholarGoogle Scholar
  17. A. Heidari, J. McGrath, I. F. Ilyas, and T. Rekatsinas. HoloDetect: Few-Shot Learning for Error Detection. arXiv:1904.02285 {cs}, Apr. 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys. Deep Q-learning from Demonstrations. In Thirty-Second AAAI Conference on Artifical Intelligence, AAAI '18, New Orleans, Apr. 2017. IEEE.Google ScholarGoogle Scholar
  19. I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 647--658, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Kaftan, M. Balazinska, A. Cheung, and J. Gehrke. Cuttlefish: A Lightweight Primitive for Adaptive Query Processing. arXiv preprint, Feb. 2018.Google ScholarGoogle Scholar
  21. D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In 3rd International Conference for Learning Representations, ICLR '15, San Diego, CA, 2015.Google ScholarGoogle Scholar
  22. A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google ScholarGoogle Scholar
  23. T. Kraska, M. Alizadeh, A. Beutel, Ed Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google ScholarGoogle Scholar
  24. T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 489--504, New York, NY, USA, 2018. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 {cs}, Aug. 2018.Google ScholarGoogle Scholar
  26. V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3):204--215, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the Join Order Benchmark. The VLDB Journal, pages 1--26, Sept. 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, CASCON '15, pages 53--59, Riverton, NJ, USA, 2015. IBM Corp.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi. A survey of deep neural network architectures and their applications. Neurocomputing, 234:11--26, Apr. 2017.Google ScholarGoogle ScholarCross RefCross Ref
  30. G. Lohman. Is Query Optimization a '"Solved" Problem? In ACM SIGMOD Blog, ACM Blog '14, 2014.Google ScholarGoogle Scholar
  31. J. Long, E. Shelhamer, and T. Darrell. Fully Convolutional Networks for Semantic Segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR '15, June 2015.Google ScholarGoogle Scholar
  32. H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh. Learning Scheduling Algorithms for Data Processing Clusters. arXiv:1810.01963 {cs, stat}, 2018.Google ScholarGoogle Scholar
  33. G. Marcus. Innateness, AlphaZero, and Artificial Intelligence. arXiv:1801.05667 {cs}, Jan. 2018.Google ScholarGoogle Scholar
  34. R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: Towards A Learned Query Optimizer. arXiv:1904.03711 {cs}, Apr. 2019.Google ScholarGoogle Scholar
  35. R. Marcus and O. Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '18, Houston, TX, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Marcus and O. Papaemmanouil. Towards a Hands-Free Query Optimizer through Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google ScholarGoogle Scholar
  37. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient Estimation of Word Representations in Vector Space. arXiv:13013781 {cs}, Jan. 2013.Google ScholarGoogle Scholar
  38. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Ostrovski. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  39. G. Moerkotte, T. Neumann, and G. Steidl. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB, 2(1):982--993, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI '16, pages 1287--1293, Phoenix, Arizona, 2016. AAAI Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep Learning for Entity Matching: A Design Space Exploration. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 19--34, New York, NY, USA, 2018. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Nevarez. Inside the SQL Server Query Optimizer. Red Gate books, Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In 2nd Workshop on Data Managmeent for End-to-End Machine Learning, DEEM '18, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Y. Park, S. Zhong, and B. Mozafari. QuickSel: Quick Selectivity Learning with Mixture Models. arXiv:1812.10568 {cs}, Dec. 2018.Google ScholarGoogle Scholar
  45. M. Poess and C. Floyd. New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Records, 29(4):64--71, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. G. Read. DeWitt clauses: Can we protect purchasers without hurting Microsoft. Rev. Litig., 25:387, 2006.Google ScholarGoogle Scholar
  47. R. Rehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, LREC '10, pages 45--50. ELRA, May 2010.Google ScholarGoogle Scholar
  48. S. Schaal. Learning from Demonstration. In Proceedings of the 9th International Conference on Neural Information Processing Systems, NIPS'96, pages 1040--1046, Cambridge, MA, USA, 1996. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Schaarschmidt, A. Kuhnle, B. Ellis, K. Fricke, F. Gessert, and E. Yoneki. LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations. arXiv:1808.07903 {cs, stat}, Aug. 2018.Google ScholarGoogle Scholar
  50. J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85--117, Jan. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In J. Mylopolous and M. Brodie, editors, SIGMOD '89, SIGMOD '89, pages 511--522, San Francisco (CA), 1989. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484--489, Jan. 2016.Google ScholarGoogle ScholarCross RefCross Ref
  53. M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, VLDB '01, pages 19--28, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. N. Tran, A. Lamb, L. Shrinivas, S. Bodagala, and J. Dave. The Vertica Query Optimizer: The case for specialized query optimizers. In 2014 IEEE 30th International Conference on Data Engineering, ICDE '14, pages 1108--1119, Mar. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  56. I. Trummer, S. Moseley, D. Maram, S. Jo, and J. Antonakakis. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. PVLDB, 11(12):2074--2077, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. K. Tzoumas, T. Sellis, and C. Jensen. A Reinforcement Learning Approach for Adaptive Query Processing. Technical Reports, June 2008.Google ScholarGoogle Scholar
  58. L. van der Maaten and G. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579--2605, 2008.Google ScholarGoogle Scholar
  59. F. Waas and A. Pellenkoft. Join Order Selection (Good Enough Is Easy). In Advances in Databases, BNCD '00, pages 51--67. Springer, Berlin, Heidelberg, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. W. Wang, M. Zhang, G. Chen, H. V. Jagadish, B. C. Ooi, and K.-L. Tan. Database Meets Deep Learning: Challenges and Opportunities. SIGMOD Rec., 45(2):17--22, Sept. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. L. Yu, J. Wang, K. R. Lai, and X. Zhang. Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3):671--681, Mar. 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward Multimodal Image-to-Image Translation. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, NIPS '17, pages 465--476. Curran Associates, Inc., 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. S. Zilberstein. Using Anytime Algorithms in Intelligent Systems. AI Magazine, 17(3):73--73, Mar. 1996.Google ScholarGoogle Scholar

Index Terms

  1. Neo: a learned query optimizer
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 12, Issue 11
          July 2019
          543 pages

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 July 2019
          Published in pvldb Volume 12, Issue 11

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader