ABSTRACT
Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this paper we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR) using proximity and question type features achieves a total reciprocal document rank of .20 on the TREC 8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
- Cody Kwok, Oren Etzioni, and Daniel S. Weld. Scaling question answering to the web. In the Proceedings of the 10th World Wide Web Conference (WWW 2001), Hong Kong, 2001. Google ScholarDigital Library
- Steven Abney, Michael Collins, and Amit Singhal. Answer extraction. In the Proceedings of ANLP 2000, 2000. Google ScholarDigital Library
- B. Katz. From sentence processing to information access on the World Wide Web. In Natural Language Processing for the World Wide Web: Papers from the 1997 AAAI Spring Symposium, pages 77--94, 1997.Google Scholar
- Julian Kupiec. Murax: A robust linguistic approach for question answering using an on-line encyclopedia. In the Proceedings of 16th SIGIR Conference, Pittsburgh, PA, 2001. Google ScholarDigital Library
- Ellen Voorhees and Dawn Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.Google ScholarCross Ref
- J. Prager, D. Radev, E. Brown, and A. Coden. The use of predictive annotation for question answering in trec8. In NIST Special Publication 500-246:The Eighth Text REtrieval Conference (TREC 8), pages 399--411, 1999.Google Scholar
- Dragomir R. Radev, John Prager, and Valerie Samn. Ranking suspected answers to natural language questions using predictive annotation. In the Proceedings of 6th Conference on Applied Natural Language Processing (ANLP), Seattle, Washington, 2000. Google ScholarDigital Library
- E. Hovy, L. Gerber, U. Hermjakob, M. Junk, and C-Y Lin. Question answering in webclopedia. In NIST Special Publication 500-249: The Ninth Text REtrieval Conference (TREC 9), pages 655--664, 2000.Google Scholar
- C. L. A. Clarke, G. V. Cormack, D. I .E. Kisman, and T. R. Lynam. Question answering by passage selection (multitext experiments for trec-9). In NIST Special Publication 500-249: The Ninth Text REtrieval Conference (TREC 9), pages 673--683, 2000.Google Scholar
- S. Harabagiu, D. Moldovan, R. Mihalcea M. Pasca, R. Bunescu M. Surdeanu, R. Gîrju, V. Rus, and P. Morarescu. Falcon: Boosting knowledge for answer engines. In NIST Special Publication 500-249:The Ninth Text REtrieval Conference (TREC 9), pages 479--488, 2000.Google Scholar
- D. R. Radev, K. Libner, and W. Fan. Getting answers to natural language queries on the web. Journal of the American Society for Information Science and Technology (JASIST), page to appear, 2002. Google ScholarDigital Library
- Eugene Agichtein, Steve Lawrence, and Luis Gravano. Learning search engine specific query transformations for question answering. In the Proceedings of the 10th World Wide Web Conference (WWW 2001), Hong Kong, 2001. Google ScholarDigital Library
- Eric J. Glover, Gary W. Flake, Steve Lawrence, William P. Birmingham, Andries Kruger, C. Lee Giles, and David M. Pennock. Improving category specific web search by learning query modifications. In The Proceedings of Symposium on Applications and the Internet, SAINT 2001, San Diego, California, 2001. Google ScholarDigital Library
- Dragomir R. Radev, Hong Qi, Zhiping Zheng, Sasha Blair-Goldensohn, Zhu Zhang, Weiguo Fan, and John Prager. Mining the web for answers to natural language questions. In the Proceedings of ACM CIKM 2001: Tenth International Conference on Information and Knowledge Management, Atlanta, GA, 2001. Google ScholarDigital Library
- William W. Cohen. Learning trees and rules with set-valued features. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pages 709--716, Menlo Park, August 1996. AAAI Press MIT Press. Google ScholarDigital Library
- Andrei Mikheev. Document centered approach to text normalization. In Proceedings of SIGIR'2000, pages 136--143, 2000. Google ScholarDigital Library
- Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21(4):543--566, December 1995. Google ScholarDigital Library
- S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-4. In D. K. Harman, editor, Proceedings of the Fourth Text Retrieval Conference, pages 73--97. NIST Special Publication 500-236, 1996.Google Scholar
Index Terms
- Probabilistic question answering on the web
Recommendations
Scaling question answering to the web
The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as “who was the first American in space?” or “what is the second tallest mountain in the world?” Yet today's most advanced ...
Evaluating Google queries based on language preferences
This paper evaluates the assumption that users expect search engines to retrieve the same results for queries regardless of the language or the location of the originator. The dependency of the Google search engine on the language and location from ...
Learning to find answers to questions on the Web
We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from ...
Comments