research-article

Open Access

SQLizer: query synthesis from natural language

Authors:
Navid Yaghmazadeh

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

,
Yuepeng Wang

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

,
Isil Dillig

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

,
Thomas Dillig

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

Proceedings of the ACM on Programming Languages Volume 1 Issue OOPSLAArticle No.: 63pp 1–26https://doi.org/10.1145/3133887

Published:12 October 2017Publication History

Proceedings of the ACM on Programming Languages

Abstract

This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with type-directed program synthesis and automated program repair. Starting with a program sketch obtained using standard parsing techniques, our approach involves an iterative refinement loop that alternates between probabilistic type inhabitation and automated sketch repair. We use the proposed idea to build an end-to-end system called SQLIZER that can synthesize SQL queries from natural language. Our method is fully automated, works for any database without requiring additional customization, and does not require users to know the underlying database schema. We evaluate our approach on over 450 natural language queries concerning three different databases, namely MAS, IMDB, and YELP. Our experiments show that the desired query is ranked within the top 5 candidates in close to 90% of the cases and that SQLIZER outperforms NALIR, a state-of-the-art tool that won a best paper award at VLDB'14.

References

I Androutsopoulos, G Ritchie, and P Thanisch. 1993. Masque/sql: An Efficient and Portable Natural Language Query Interface for Relational Databases. Tech report, University of Edinburgh (1993).Google Scholar
Ion Androutsopoulos, Graeme D. Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases - An Introduction. Natural Language Engineering (1995).Google Scholar
Thomas Ball, Mayur Naik, and Sriram K. Rajamani. 2003. From symptom to cause: localizing errors in counterexample traces. In POPL. 97–105. Google ScholarDigital Library
Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin G. Zorn. 2015. FlashRelate: extracting relational data from semi-structured spreadsheets using examples. ACM, 218–228.Google Scholar
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs. In EMNLP. 1533–1544.Google Scholar
Bob Carpenter. 1997. Type-logical semantics. MIT press.Google Scholar
Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. In PLDI. 3–14. Google ScholarDigital Library
E. F. Codd. 1974. Seven Steps to Rendezvous with the Casual User. In IFIP Working Conference Data Base Management. 179–200.Google Scholar
Aditya Desai, Sumit Gulwani, Vineet Hingorani, Nidhi Jain, Amey Karkare, Mark Marron, Sailesh R, and Subhajit Roy. 2016. Program synthesis using natural language. In ICSE.Google Scholar
Ramez Elmasri and Shamkant B. Navathe. 2011. Fundamentals of Database Systems. Addison-Wesley.Google Scholar
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017a. Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Programming Language Design and Implementation. Google ScholarDigital Library
Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W. Reps. 2017b. Component-based synthesis for complex APIs. In POPL. 599–612. Google ScholarDigital Library
John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In PLDI. 229–239.Google Scholar
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair. In ICSE. 3–13.Google Scholar
Alex Groce and Willem Visser. 2003. What Went Wrong: Explaining Counterexamples. In SPIN.Google Scholar
Barbara J. Grosz, Douglas E. Appelt, Paul A. Martin, and Fernando C. N. Pereira. 1987. TEAM: An Experiment in the Design of Transportable Natural-Language Interfaces. Artificial Intelligence 32, 2 (1987), 173–243. Google ScholarDigital Library
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, Vol. 46. ACM, 317–330. Google ScholarDigital Library
Sumit Gulwani and Mark Marron. 2014. NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation. In SIGMOD. Google ScholarDigital Library
Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Complete completion using types and weights. In ACM SIGPLAN Notices , Vol. 48. ACM, 27–38. Google ScholarDigital Library
Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, and Jonathan Slocum. 1978. Developing a Natural Language Interface to Complex Data. TODS 3, 2 (1978), 105–147. Google ScholarDigital Library
James A. Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In ASE. 273–282. Google ScholarDigital Library
James A. Jones, Mary Jean Harrold, and John T. Stasko. 2002. Visualization of test information to assist fault localization. In ICSE . 467–477. Google ScholarCross Ref
Manu Jose and Rupak Majumdar. 2011a. Bug-Assist: assisting fault localization in ANSI-C programs. In CAV. Google ScholarCross Ref
Manu Jose and Rupak Majumdar. 2011b. Cause clue clauses: error localization using maximum satisfiability. PLDI (2011).Google ScholarDigital Library
Rohit J. Kate and Raymond J. Mooney. 2006. Using String-Kernels for Learning Semantic Parsers. In ACL. Google ScholarDigital Library
Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. 2005. Learning to Transform Natural to Formal Languages. In AAAI. 1062–1068.Google Scholar
Vu Le and Sumit Gulwani. 2014. FlashExtract: a framework for data extraction by examples. ACM, 542–553.Google Scholar
Vu Le, Sumit Gulwani, and Zhendong Su. 2013. Smartsynth: Synthesizing smartphone automation scripts from natural language. In MobiSys. 193–206.Google ScholarDigital Library
Fei Li and H. V. Jagadish. 2014. Constructing an Interactive Natural Language Interface for Relational Databases. PVLDB 8, 1 (2014), 73–84. Google ScholarDigital Library
Yunyao Li, Huahai Yang, and H. V. Jagadish. 2006. Constructing a Generic Natural Language Interface for an XML Database. In EDBT. 737–754. Google ScholarDigital Library
Percy Liang and Christopher Potts. 2015. Bringing machine learning and compositional semantics together. Annual Review of Linguistics 1, 1 (2015), 355–376. Google ScholarCross Ref
Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In ESEC/FSE. Google ScholarDigital Library
Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. In POPL. Google ScholarDigital Library
Bill MacCartney and Christopher D Manning. 2009. An extended model of natural logic. In IWCS. 140–156.Google Scholar
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In ACL System Demonstrations. 55–60. Google ScholarCross Ref
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS.Google Scholar
Scott Miller, David Stallard, Robert J. Bobrow, and Richard M. Schwartz. 1996. A Fully Statistical Approach to Natural Language Interfaces. In ACL. 55–61. Google ScholarDigital Library
Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: program repair via semantic analysis. In ICSE. 772–781.Google Scholar
Peter-Michael Osera and Steve Zdancewic. 2015. Type- and example-directed program synthesis. In PLDI.Google Scholar
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In PLDI. 522–538. Google ScholarDigital Library
Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A framework for inductive program synthesis. ACM, 107–126.Google Scholar
Ana-Maria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander Yates. 2004. Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability. In COLING.Google Scholar
Ana-Maria Popescu, Oren Etzioni, and Henry A. Kautz. 2003. Towards a theory of natural language interfaces to databases. In IUI. 149–157.Google Scholar
Chris Quirk, Raymond Mooney, and Michel Galley. 2015. Language to code: Learning semantic parsers for if-this-then-that recipes. In ACL. 878–888.Google Scholar
Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. 2015. Compositional Program Synthesis from Natural Language and Examples. In IJCAI.Google Scholar
Armando Solar Lezama. 2008. Program Synthesis By Sketching. Ph.D. Dissertation. EECS Department, University of California, Berkeley.Google Scholar
Armando Solar-Lezama, Rodric M. Rabbah, Rastislav Bodík, and Kemal Ebcioglu. 2005. Programming by sketching for bit-streaming programs. In PLDI. Google ScholarDigital Library
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodík, Sanjit A. Seshia, and Vijay A. Saraswat. 2006. Combinatorial sketching for finite programs. In ASPLOS. 404–415. Google ScholarDigital Library
Lappoon R Tang and Raymond J Mooney. 2000. Automated construction of database interfaces: Integrating statistical and relational learning for semantic parsing. In EMNLP. 133–141.Google Scholar
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In SIGMOD. 535–548. Google ScholarDigital Library
Chenglong Wang, Alvin Cheung, and Ras Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-Output Examples. In Programming Language Design and Implementation. Google ScholarDigital Library
David H. D. Warren and Fernando C. N. Pereira. 1982. An Efficient Easily Adaptable System for Interpreting Natural Language Queries. American Journal of Computational Linguistics 8, 3-4 (1982), 110–122.Google Scholar
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In ICSE. 364–374.Google Scholar
Ben Wiedermann, Ali Ibrahim, and William R. Cook. 2008. Interprocedural query extraction for transparent persistence. In OOPSLA . 19–36. Google ScholarDigital Library
William A Woods, Ronald M Kaplan, and Bonnie Nash-Webber. 1972. The lunar sciences natural language information system. Bolt, Beranek and Newman.Google Scholar
Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016. Synthesizing transformations on hierarchically structured data. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation . ACM, 508–521. Google ScholarDigital Library
John M. Zelle and Raymond J. Mooney. 1993. Learning Semantic Grammars with Constructive Inductive Logic Programming. In AAAI. 817–822.Google Scholar
John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries Using Inductive Logic Programming. In AAAI . 1050–1055.Google Scholar
Sai Zhang and Yuyin Sun. 2013. Automatically synthesizing SQL queries from input-output examples. In ASE. Google ScholarDigital Library
Moshé M. Zloof. 1975. Query-by-Example: the Invocation and Definition of Tables and Forms. In VLDB. Google ScholarDigital Library

Index Terms

SQLizer: query synthesis from natural language

Recommendations

Access to Indexed Hierarchical Databases Using a Relational Query Language

An efficient means of accessing indexed hierarchical databases using a relational query language is presented. The purpose is to achieve an effective sharing of heterogeneous distributed databases. Translation of hierarchical data to an equivalent ...
Read More
Query processing over object views of relational data

This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A ...
Read More
Providing Quality Responses with Natural Language Interfaces: The Null Value Problem

An underlying relational database model and the database query language SQL are assumed, and methods are presented for responding with appropriate answers to null value responses. This is done by using a knowledge base based on RM/T, an extended ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 1, Issue OOPSLA
October 2017
1786 pages
EISSN:2475-1421
DOI:10.1145/3152284
Issue’s Table of Contents

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2017
Published in pacmpl Volume 1, Issue OOPSLA

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Program Synthesis
Programming by Natural Languages
Relational Databases
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 122
  Total Citations
  View Citations
- 2,963
  Total Downloads
- Downloads (Last 12 months)541
- Downloads (Last 6 weeks)77
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SQLizer: query synthesis from natural language

Proceedings of the ACM on Programming Languages

Abstract

References

Cited By

Index Terms

Recommendations

Access to Indexed Hierarchical Databases Using a Relational Query Language

Query processing over object views of relational data

Providing Quality Responses with Natural Language Interfaces: The Null Value Problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SQLizer: query synthesis from natural language

Proceedings of the ACM on Programming Languages

Abstract

References

Cited By

Index Terms

Recommendations

Access to Indexed Hierarchical Databases Using a Relational Query Language

Query processing over object views of relational data

Providing Quality Responses with Natural Language Interfaces: The Null Value Problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media