ABSTRACT
We consider the problem of open-domain question answering (Open QA) over massive knowledge bases (KBs). Existing approaches use either manually curated KBs like Freebase or KBs automatically extracted from unstructured text. In this paper, we present OQA, the first approach to leverage both curated and extracted KBs.
A key technical challenge is designing systems that are robust to the high variability in both natural language questions and massive KBs. OQA achieves robustness by decomposing the full Open QA problem into smaller sub-problems including question paraphrasing and query reformulation. OQA solves these sub-problems by mining millions of rules from an unlabeled question corpus and across multiple KBs. OQA then learns to integrate these rules by performing discriminative training on question-answer pairs using a latent-variable structured perceptron algorithm. We evaluate OQA on three benchmark question sets and demonstrate that it achieves up to twice the precision and recall of a state-of-the-art Open QA system.
Supplemental Material
- M. Banko, E. Brill, S. Dumais, and J. Lin. AskMSR: Question answering using the worldwide web. In 2002 AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002.Google Scholar
- M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open Information Extraction from the Web. In IJCAI, 2007. Google ScholarDigital Library
- J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on Freebase from question-answer pairs. In EMNLP, 2013.Google Scholar
- Q. Cai and A. Yates. Large-scale Semantic Parsing via Schema Matching and Lexicon Extension. In ACL, 2013.Google Scholar
- A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. H. Jr., and T. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.Google ScholarDigital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. Google ScholarDigital Library
- J. Clarke, D. Goldwasser, M.-W. Chang, and D. Roth. Driving Semantic Parsing from the World's Response. In CoNLL, 2010. Google ScholarDigital Library
- A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. Google ScholarDigital Library
- A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011. Google ScholarDigital Library
- A. Fader, L. Zettlemoyer, and O. Etzioni. Paraphrase-Driven Learning for Open Question Answering. In ACL, 2013.Google Scholar
- Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Mach. Learn., 37(3):277--296, 1999. Google ScholarDigital Library
- B. J. Grosz, D. E. Appelt, P. A. Martin, and F. C. N. Pereira. TEAM: An Experiment in the Design of Transportable Natural-Language Interfaces. Artificial Intelligence, 32(2):173--243, 1987. Google ScholarDigital Library
- B. Katz. Annotating the World Wide Web using Natural Language. In RIAO, pages 136--159, 1997.Google Scholar
- P. Koehn. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In AMTA, Lecture Notes in Computer Science, pages 115--124. Springer, 2004.Google Scholar
- T. Kwiatkowski, E. Choi, Y. Artzi, and L. Zettlemoyer. Scaling semantic parsers with on-the-fly ontology matching. In EMNLP, 2013.Google Scholar
- C. Kwok, O. Etzioni, and D. S. Weld. Scaling question answering to the web. ACM Trans. Inf. Syst., 19(3):242--262, 2001. Google ScholarDigital Library
- P. Liang, A. Bouchard-Côté, D. Klein, and B. Taskar. An end-to-end discriminative approach to machine translation. In ACL, 2006. Google ScholarDigital Library
- P. Liang, M. Jordan, and D. Klein. Learning Dependency-Based Compositional Semantics. In ACL, 2011. Google ScholarDigital Library
- D. Lin and P. Pantel. DIRT -- Discovery of inference rules from text. In KDD, 2001. Google ScholarDigital Library
- T. Lin, Mausam, and O. Etzioni. Entity linking at web scale. AKBC-WEKEX, 2012. Google ScholarDigital Library
- M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant Supervision for Relation Extraction Without Labeled Data. In ACL, 2009. Google ScholarDigital Library
- X. Sun, T. Matsuzaki, D. Okanohara, and J. Tsujii. Latent variable perceptron algorithm for structured classification. In IJCAI, 2009. Google ScholarDigital Library
- C. Unger, L. Bühmann, J. Lehmann, A.-C. N. Ngomo, D. Gerber, and P. Cimiano. Template-Based Question Answering over RDF Data. In WWW, 2012. Google ScholarDigital Library
- E. M. Voorhees and D. M. Tice. Building a question answering test collection. In SIGIR, 2000. Google ScholarDigital Library
- S. Walter, C. Unger, P. Cimiano, and D. B\"ar. Evaluation of a Layered Approach to Question Answering over Linked Data. In ISWC, 2012. Google ScholarDigital Library
- Y. W. Wong and R. J. Mooney. Learning synchronous grammars for semantic parsing with lambda calculus. In ACL, 2007.Google Scholar
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD, 2012. Google ScholarDigital Library
- M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, and G. Weikum. Natural Language Questions for the Web of Data. In EMNLP, 2012. Google ScholarDigital Library
- A. Yates and O. Etzioni. Unsupervised methods for determining object and relation synonyms on the web. JAIR, 34:255--296, March 2009. Google ScholarDigital Library
- J. M. Zelle and R. J. Mooney. Learning to Parse Database Queries Using Inductive Logic Programming. In AAAI, 1996. Google ScholarDigital Library
- L. S. Zettlemoyer and M. Collins. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In UAI, 2005.Google ScholarDigital Library
Index Terms
- Open question answering over curated and extracted knowledge bases
Recommendations
Open-Domain Question Answering Framework Using Wikipedia
AI 2016: Advances in Artificial IntelligenceAbstractThis paper explores the feasibility of implementing a model for an open domain, automated question and answering framework that leverages Wikipedia’s knowledgebase. While Wikipedia implicitly comprises answers to common questions, the ...
Entity Disambiguation with Linkless Knowledge Bases
WWW '16: Proceedings of the 25th International Conference on World Wide WebNamed Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain ...
Learning relatedness measures for entity linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementEntity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowledge base. The ...
Comments