skip to main content
research-article

Natural language to SQL: where are we today?

Published:01 June 2020Publication History
Skip Abstract Section

Abstract

Translating natural language to SQL (NL2SQL) has received extensive attention lately, especially with the recent success of deep learning technologies. However, despite the large number of studies, we do not have a thorough understanding of how good existing techniques really are and how much is applicable to real-world situations. A key difficulty is that different studies are based on different datasets, which often have their own limitations and assumptions that are implicitly hidden in the context or datasets. Moreover, a couple of evaluation metrics are commonly employed but they are rather simplistic and do not properly depict the accuracy of results, as will be shown in our experiments. To provide a holistic view of NL2SQL technologies and access current advancements, we perform extensive experiments under our unified framework using eleven of recent techniques over 10+ benchmarks including a new benchmark (WTQ) and TPC-H. We provide a comprehensive survey of recent NL2SQL methods, introducing a taxonomy of them. We reveal major assumptions of the methods and classify translation errors through extensive experiments. We also provide a practical tool for validation by using existing, mature database technologies such as query rewrite and database testing. We then suggest future research directions so that the translation can be used in practice.

References

  1. J. Abreu, L. Fred, D. Macêdo, and C. Zanchettin. Hierarchical attentional hybrid neural networks for document classification. In ICANN, pages 396--402, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.Google ScholarGoogle Scholar
  3. C. Baik, H. V. Jagadish, and Y. Li. Bridging the semantic gap with SQL query logs in natural language interfaces to databases. In ICDE, pages 374--385, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  4. F. Basik, B. Hättasch, A. Ilkhechi, A. Usta, S. Ramaswamy, P. Utama, N. Weir, C. Binnig, and U. Çetintemel. Dbpal: A learned nl-interface for databases. In SIGMOD, pages 1765--1768, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Bengio, P. Y. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Networks, 5(2):157--166, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Bogin, J. Berant, and M. Gardner. Representing schema structure with graph neural networks for text-to-sql parsing. In ACL, pages 4560--1565, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  7. K. D. Bollacker, R. P. Cook, and P. Tufts. Freebase: A shared database of structured general human knowledge. In AAAI, pages 1962--1963, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Castelein, M. F. Aniche, M. Soltani, A. Panichella, and A. van Deursen. Search-based test data generation for SQL queries. In ICSE, pages 1120--1230, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cheng, S. Reddy, V. A. Saraswat, and M. Lapata. Learning structured natural language representations for semantic parsing. In ACL, pages 44--55, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Chu, B. Murphy, J. Roesch, A. Cheung, and D. Suciu. Axiomatic foundations and algorithms for deciding semantic equivalences of SQL queries. PVLDB, 11(11):1482--1495, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR, 2017.Google ScholarGoogle Scholar
  12. D. A. Dahl, M. Bates, M. Brown, W. M. Fisher, K. Hunicke-Smith, D. S. Pallett, C. Pao, A. I. Rudnicky, and E. Shriberg. Expanding the scope of the ATIS task: The ATIS-3 corpus. In ARPA Human Language Technology Workshop, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Finegan-Dollak, J. K. Kummerfeld, L. Zhang, K. Ramanathan, S. Sadasivam, R. Zhang, and D. R. Radev. Improving text-to-sql evaluation methodology. In ACL, pages 351--360, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pages 1126--1135, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Frank. Phrase structure composition and syntactic dependencies, volume 38. Mit Press, 2004.Google ScholarGoogle Scholar
  16. J. Ganitkevitch, B. V. Durme, and C. Callison-Burch. PPDB: the paraphrase database. In HLT-NAACL, pages 758--764, 2013.Google ScholarGoogle Scholar
  17. A. Giordani and A. Moschitti. Translating questions to SQL queries with generative parsers discriminatively reranked. In COLING, pages 401--410, 2012.Google ScholarGoogle Scholar
  18. B. J. Grosz. TEAM: A transportable natural-language interface system. In ANLP, pages 39--45, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ç. Gülçehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio. Pointing the unknown words. In ACL, pages 140--149, 2016.Google ScholarGoogle Scholar
  20. J. Guo, Z. Zhan, Y. Gao, Y. Xiao, J. Lou, T. Liu, and D. Zhang. Towards complex text-to-sql in cross-domain database with intermediate representation. In ACL, pages 4524--4535, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  21. P. Huang, C. Wang, R. Singh, W. Yih, and X. He. Natural language to structured query generation via meta-learning. In NAACL-HLT, pages 732--738, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Iyer, I. Konstas, A. Cheung, J. Krishnamurthy, and L. Zettlemoyer. Learning a neural semantic parser from user feedback. In ACL, pages 963--973, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  23. D. Jurafsky and J. H. Martin. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition. Prentice Hall series in artificial intelligence. Prentice Hall, Pearson Education International, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Lapata and L. Dong. Coarse-to-fine decoding for neural semantic parsing. In ACL, pages 731--742, 2018.Google ScholarGoogle Scholar
  25. F. Li and H. V. Jagadish. Constructing an interactive natural language interface for relational databases. PVLDB, 8(1):73--84, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel. Gated graph sequence neural networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.Google ScholarGoogle Scholar
  27. Z. Lin, M. Feng, C. N. dos Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio. A structured self-attentive sentence embedding. CoRR, abs/1703.03130, 2017.Google ScholarGoogle Scholar
  28. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Pasupat and P. Liang. Compositional semantic parsing on semi-structured tables. In ACL, pages 1470--1480, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Popescu, A. Armanasu, O. Etzioni, D. Ko, and A. Yates. Modern natural language interfaces to databases: Composing statistical parsing with semantic tractability. In COLING, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Popescu, O. Etzioni, and H. A. Kautz. Towards a theory of natural language interfaces to databases. In IUI, pages 149--157, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. J. Price. Evaluation of spoken language systems: the ATIS domain. In DARPA Speech and Natural Language Workshop, pages 91--95, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Rabinovich, M. Stern, and D. Klein. Abstract syntax networks for code generation and semantic parsing. In ACL, pages 1139--1149, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  35. D. Saha, A. Floratou, K. Sankaranarayanan, U. F. Minhas, A. R. Mittal, and F. Özcan. ATHENA: an ontology-driven system for natural language querying over relational data stores. PVLDB, 9(12):1209--1220, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. R. Tang and R. J. Mooney. Automated construction of database interfaces: Integrating statistical and relational learning for semantic parsing. In EMNLP, pages 133--141. Association for Computational Linguistics, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. O. Vinyals, M. Fortunato, and N. Jaitly. Pointer networks. In NIPS, pages 2692--2700, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Wang, M. Brockschmidt, and R. Singh. Pointing out sql queries from text. Technical Report MSR-TR-2017-45, 2018.Google ScholarGoogle Scholar
  39. D. H. D. Warren and F. C. N. Pereira. An efficient easily adaptable system for interpreting natural language queries. Am. J. Comput. Linguistics, 8(3-4):110--122, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270--280, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Xiao, M. Dymetman, and C. Gardent. Sequence-based structured prediction for semantic parsing. In ACL, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  42. X. Xu, C. Liu, and D. Song. Sqlnet: Generating structured queries from natural language without reinforcement learning. CoRR, abs/1711.04436, 2017.Google ScholarGoogle Scholar
  43. N. Yaghmazadeh, Y. Wang, I. Dillig, and T. Dillig. Sqlizer: query synthesis from natural language. PACMPL, 1(OOPSLA):63:1--63:26, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Yavuz, I. Gur, Y. Su, and X. Yan. Dialsql: Dialogue based structured query generation. In ACL, pages 1339--1349, 2018.Google ScholarGoogle Scholar
  45. P. Yin and G. Neubig. A syntactic neural model for general-purpose code generation. In ACL, pages 440--450, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  46. T. Yu, Z. Li, Z. Zhang, R. Zhang, and D. R. Radev. Typesql: Knowledge-based type-aware neural text-to-sql generation. In NAACL-HLT, pages 588--594, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  47. T. Yu, M. Yasunaga, K. Yang, R. Zhang, D. Wang, Z. Li, and D. R. Radev. Syntaxsqlnet: Syntax tree networks for complex and cross-domain text-to-sql task. In EMNLP, pages 1653--1663, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  48. T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang, and D. R. Radev. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In EMNLP, pages 3911--3921, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  49. J. M. Zelle and R. J. Mooney. Learning to parse database queries using inductive logic programming. In AAAI, pages 1050--1055, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. L. S. Zettlemoyer and M. Collins. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI, pages 658--666, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. L. S. Zettlemoyer and M. Collins. Online learning of relaxed CCG grammars for parsing to logical form. In EMNLP-CoNLL, pages 678--687, 2007.Google ScholarGoogle Scholar
  52. V. Zhong, C. Xiong, and R. Socher. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103, 2017.Google ScholarGoogle Scholar
  53. M. Zhou, G. Cao, T. Liu, N. Duan, D. Tang, B. Qin, X. Feng, J. Ji, and Y. Sun. Semantic parsing with syntax- and table-aware SQL generation. In ACL, pages 361--372, 2018.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 10
    June 2020
    193 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 June 2020
    Published in pvldb Volume 13, Issue 10

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader