skip to main content
10.1145/3035918.3054783acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Natural Language Data Management and Interfaces: Recent Development and Open Challenges

Published:09 May 2017Publication History

ABSTRACT

The volume of natural language text data has been rapidly increasing over the past two decades, due to factors such as the growth of the Web, the low cost associated to publishing and the progress on the digitization of printed texts. This growth combined with the proliferation of natural language systems for search and retrieving information provides tremendous opportunities for studying some of the areas where database systems and natural language processing systems overlap. This tutorial explores two more relevant areas of overlap to the database community: (1) managing natural language text data in a relational database, and (2) developing natural language interfaces to databases. The tutorial presents state-of-the-art methods, related systems, research opportunities and challenges covering both areas.

References

  1. Eugene Agichtein and Luis Gravano. Querying text databases for efficient information extraction. In Proc. of the ICDE Conference, pages 113--124, Bangalore, India, March 2003.Google ScholarGoogle ScholarCross RefCross Ref
  2. Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, and Venkatesh Ganti. Scalable ad-hoc entity extraction from text collections. PVLDB, 1(1):945--957, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yael Amsterdamer, Anna Kukliansky, and Tova Milo. A natural language interface for querying general and individual knowledge. PVLDB, 8(12):1430--1441, August 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H Bais, M Machkour, and L Koutti. Querying database using a universal natural language interface based on machine learning. In IT4OD, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-answer pairs. In Proc. of the EMNLP Conference, volume 2, page 6, 2013.Google ScholarGoogle Scholar
  6. Elisa Bertino, Beng Chin Ooi, Ron Sacks-Davis, Kian-Lee Tan, Justin Zobel, Boris Shidlovsky, and Daniele Andronico. Indexing techniques for advanced database systems, volume 8. Springer Science & Business Media, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael J Cafarella and Oren Etzioni. A search engine for natural language applications. In Proc. of the WWW conference, pages 442--452. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Michael J. Cafarella, Christopher Re, Dan Suciu, and Oren Etzioni. Structured querying of web text data: A technical challenge. In Proc. of the CIDR Conference, pages 225--234, Asilomar, CA, January 2007.Google ScholarGoogle Scholar
  9. Guoray Cai, Hongmei Wang, Alan M. MacEachren, and Sven Fuhrmann. Natural conversational interfaces to geospatial databases. Transactions in GIS, 9(2):199--221, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. Qingqing Cai and Alexander Yates. Large-scale semantic parsing via schema matching and lexicon extension. In ACL, pages 423--433. Citeseer, 2013.Google ScholarGoogle Scholar
  11. Angel X Chang and Christopher D Manning. Tokensregex: Defining cascaded regular expressions over tokens. Technical Report CSTR-2014-02, Department of Computer Science, Stanford University.Google ScholarGoogle Scholar
  12. Surajit Chaudhuri, Umeshwar Dayal, and Tak W Yan. Join queries with external text sources: Execution and optimization techniques. In ACM SIGMOD Record, pages 410--422, San Jose, California, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yang Chen and Daisy Zhe Wang. Knowledge expansion over probabilistic knowledge bases. In Proc. of the SIGMOD conference, pages 649--660. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Eric Chu, Akanksha Baid, Ting Chen, AnHai Doan, and Jeffrey Naughton. A relational approach to incrementally extracting and querying structure in unstructured data. In Proc. of the VLDB Conference, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Chubak and D. Rafiei. Index Structures for Efficiently Searching Natural Language Text. In Proc. of the CIKM Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pirooz Chubak and Davood Rafiei. Efficient indexing and querying over syntactically annotated trees. PVLDB, 5(11):1316--1327, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E.F. Codd. Seven steps to rendezvous with the casual user. In IFIP Working Conference Data Base Management, pages 179--200, 1974.Google ScholarGoogle Scholar
  18. Francesco Draicchio1 and Aldo Gangemi. Fred: From natural language text to rdf and owl in one click. In Extended Semantic Web Conference, pages 263--267, 2013.Google ScholarGoogle Scholar
  19. Eduardo M. Eisman, María Navarro, and Juan Luis Castro. A multi-agent conversational system with heterogeneous data sources access. Expert Syst. Appl., 53:172--191, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dan Moldovan et al. LCC tools for question answering. In TREC, 2002.Google ScholarGoogle Scholar
  21. Rodolfo A. Pazos R. et al. Natural language interfaces to databases: An analysis of the state of the art. Recent Advances on Hybrid Intelligent Systems, 451:463--480, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yunyao Li et al. Enabling domain-awareness for a generic natural language interface. In AAAI, pages 833--838, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. David A. Ferrucci. Introduction to "this is watson". IBM Journal of Research and Development, 56(3):1, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gaston H. Gonnet and Frank Wm. Tompa. Mind your grammar: a new approach to modelling text. In Proc. of the VLDB Conference, pages 339--346, Brighton, England, September 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Carolin Haas and Stefan Riezler. Responsebased learning for machine translation of opendomain database queries. In Proc. of NAACL HLT, pages 1339--1344, 2015.Google ScholarGoogle Scholar
  26. Alpa Jain, AnHai Doan, and Luis Gravano. Optimizing SQL queries over text databases. In Proc. of the ICDE Conference, pages 636--645, Cancun, Mexico, April 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rohini Kokare and Kirti Wanjale. A natural language query builder interface for structured databases using dependency parsing. International Journal of Mathematical Sciences and Computing, 1(4):11--20, November 2015.Google ScholarGoogle ScholarCross RefCross Ref
  28. Jayant Krishnamurthy and Tom M Mitchell. Weakly supervised training of semantic parsers. In Proc. of the EMNLP Conference, pages 754--765. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nicolas Kuchmann-Beauger and Marie-Aude Aufaure. A natural language interface for data warehouse question answering. In Natural Language Processing and Information Systems, volume 6716, pages 201--208. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fei Li and H. V. Jagadish. Constructing an interactive natural language interface for relational databases. PVLDB, 8(1):73--84, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Fei Li and H. V. Jagadish. Understanding natural language queries over relational databases. SIGMOD Record, 45(1):6--13, June 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yunyao Li, Huahai Yang, and H. V. Jagadish. Constructing a generic natural language interface for an XML database. In Proc. of the EDBT Conference, pages 737--754, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yunyao Li, Huahai Yang, and H. V. Jagadish. Nalix: A generic natural language search environment for XML data. ACM Trans. Database Systems, 32(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dekang Lin and Patrick Pantel. Dirt - discovery of inference rules from text. In Proc. of the KDD Conference, pages 323--328, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ana maria Popescu et al. Modern natural language interfaces to databases: Composing statistical parsing with semantic tractability. In Proc. of the COLING Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dan Moldovan and Vasile Rus. Logic form transformation of wordnet and its applicability to question answering. In Proc. of the ACL Conference, pages 402--409, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Davide Mottin, Matteo Lissandrini, Yannis Velegrakis, and Themis Palpanas. Exemplar queries: Give me an example of what you need. Proceedings of the VLDB Endowment, 7(5):365--376, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. Scalable knowledge harvesting with high precision and high recall. In Proc. of the WSDM Conference, pages 227--236. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Davood Rafiei and Haobin Li. Data extraction from the web using wild card queries. In Proc. of the CIKM Conference, pages 1939--1942, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Deepak Ravichandran and Eduard Hovy. Learning surface text patterns for a question answering system. In Proc. of the ACL Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Siva Reddy, Oscar Täckström, Michael Collins, Tom Kwiatkowski, Dipanjan Das, Mark Steedman, and Mirella Lapata. Transforming dependency structures to logical forms for semantic parsing. Transactions of the Association for Computational Linguistics, 4:127--140, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  42. Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R. Mittal, and Fatma Özcan. Athena: An ontology-driven system for natural language querying over relational data stores. PVLDB, 9(12):1209--1220, August 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Airi Salminen and Frank Tompa. PAT expressions: an algebra for text search. Acta Linguistica Hungarica, 41(1):277--306, 1994.Google ScholarGoogle Scholar
  44. K Shabaz, Jim D O'Shea, Keeley A Crockett, and A Latham. Aneesah: A conversational natural language interface to databases. In World Congress on Engineering, pages 227--232, 2015.Google ScholarGoogle Scholar
  45. Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Ré. Incremental knowledge base construction using deepdive. Proceedings of the VLDB Endowment, 8(11):1310--1321, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Niculae Stratica, Leila Kosseim, and Bipin C. Desai. Using semantic templates for a natural language interface to the cindi virtual library. Data and Knowledge Engineering, 55(1):4--19, October 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Lappoon R Tang and Raymond J Mooney. Using multiple clause constructors in inductive logic programming for semantic parsing. In European Conference on Machine Learning, pages 466--477, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Marco A Valenzuela-Escarcega, Gustave Hahn-Powell, and Mihai Surdeanu. Odin's runes: A rule language for information extraction. In Proc. of the Language Resources and Evaluation Conference (LREC), 2016.Google ScholarGoogle Scholar
  49. Wei Xu. Data-driven approaches for paraphrasing across language variations. PhD thesis, New York University, 2014.Google ScholarGoogle Scholar
  50. Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, and Gerhard Weikum. Natural language questions for the web of data. In Proc. of the EMNLP Conference, pages 379--390. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Natural Language Data Management and Interfaces: Recent Development and Open Challenges

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
              May 2017
              1810 pages
              ISBN:9781450341974
              DOI:10.1145/3035918

              Copyright © 2017 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 May 2017

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader