skip to main content
10.1145/1321440.1321455acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Lightweight web-based fact repositories for textual question answering

Published:06 November 2007Publication History

ABSTRACT

Since answers to fact-seeking questions usually reside within small factual text nuggets, often "hidden" within full-length documents, their relevance to a question is not necessarily correlated to the relevance of the full-length document to the question. Yet previous approaches to open-domain textual question answering from large document collections quasi-unanimously employ a document retrieval stage, in order to apply widely different, often expensive answer mining techniques to only a small subset of documents. Depending on the collection size, 95% or more of the documents in the collection (much more in the case of the Web) are left out of the selected subset for any given query, and thus become invisible to subsequent processing stages for actual answer mining. This paper introduces a new model for answer retrieval for question answering. The collection is distilled offline into large repositories of facts. Each fact constitutes a potential direct answer to questions seeking a particular kind of entity or relation, such as questions asking about the date of particular events. Question answering becomes equivalent to online fact retrieval, which greatly simplifies the de-facto system architecture for fact-seeking question answering. In addition to simplicity, experiments on a fact repository acquired from approximately a billion Web documents illustrate the impact of fact repositories in extracting accurate answers to a standard evaluation set of open-domain test questions and additional sets of domain-specific questions.

References

  1. S. Abney, M. Collins, and A. Singhal. Answer extraction. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP-00), pages 296--301, Seattle, Washington, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Allan, V. Khandelwal, and R. Gupta. Temporal summaries of news topics. In Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR-01), pages 10--18, New Orleans, Louisiana, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Brants. TnT - a statistical part of speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), pages 224--231, Seattle, Washington, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Brin and L. Page. The anatomy of a large scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Chieu and Y. Lee. Query based event extraction along a timeline. In Proceedings of the 27th ACM Conference on Research and Development in Information Retrieval (SIGIR-04), Sheffield, United Kingdom, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: Is more always better? In Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR-02), pages 207--214, Tampere, Finland, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Echihabi and D. Marcu. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 16--23, Sapporo, Japan, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in KnowItAll. In Proceedings of the 13th World Wide Web Conference (WWW-04), pages 100--110, New York, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Filatova and E. Hovy. Assigning time-stamps to event-clauses. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL-01), pages 88--95, Toulouse, France, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Fleischman, E. Hovy, and A. Echihabi. Offline strategies for online question answering: Answering questions before they are asked. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 1--7, Sapporo, Japan, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING-92), pages 539--545, Nantes, France, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Ko, T. Mitamura, and E. Nyberg. Language independent probabilistic answer ranking for question answering. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL-07), pages 784--791, Prague, Czech Republic, 2007.Google ScholarGoogle Scholar
  13. J. Kupiec. MURAX: A robust linguistic approach for question answering using an on-line encyclopedia. In Proceedings of the 16th ACM Conference on Research and Development in Information Retrieval (SIGIR-93), pages 181--190, Pittsburgh, Philadelphia, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Kwok, O. Etzioni, and D. Weld. Scaling question answering to the web. ACM Transactions on Information Systems, 19(3):242--262, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Lin. An exploration of the principles underlying redundancy-based factoid question answering. ACM Transactions on Information Systems, 25(2), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Lin and B. Katz. Question answering from the Web using knowledge annotation and knowledge mining techniques. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM-03), pages 116--123, New Orleans, Louisiana, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Lita and J. Carbonell. Instance-based question answering: A data driven approach. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 396--403, Barcelona, Spain, 2004.Google ScholarGoogle Scholar
  18. J. Pustejovsky, J. Castano, R. Ingria, R. Sauri, R. Gaizauskas, A. Setzer, and G. Katz. TimeML: Robust specification of event and temporal expressions in text. In Proceedings of the 5th International Workshop on Computational Semantics (IWCS-5), Tilburg, Netherlands, 2003.Google ScholarGoogle Scholar
  19. D. Radev, W. Fan, H. Qi, H. Wu, and A. Grewal. Probabilistic question answering on the web. Journal of the American Society for Information Science and Technology, 56(3), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Ravichandran and E. Hovy. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL-02), Philadelphia, Pennsylvania, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Saquete, P. Martinez-Barco, R. Munoz, and J. Vicedo-Gonzalez. Splitting complex temporal questions for question answering systems. In Proceedings of the 42nd Annual Meeting of the Association of Computational Linguistics (ACL-04), pages 566--573, Barcelona, Spain, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Tellex, B. Katz, J. Lin, A. Fernandez, and G. Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th ACM Conference on Research and Development in Information Retrieval (SIGIR-03), pages 41--47, Toronto, Canada, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Voorhees and D. Tice. Building a question-answering test collection. In Proceedings of the 23rd International Conference on Research and Development in Information Retrieval (SIGIR-00), pages 200--207, Athens, Greece, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Wu, R. Zhang, X. Hu, and H. Kashioka. Learning unsupervised SVM classifier for answer selection in Web question answering. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-07), pages 33--41, Prague, Czech Republic, 2007.Google ScholarGoogle Scholar
  25. H. Yang and T. Chua. Web-based list question answering. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 1277--1283, Geneva, Switzerland, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Lightweight web-based fact repositories for textual question answering

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader