article

Scaling question answering to the web

Authors:
Cody Kwok

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
Oren Etzioni

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
Daniel S. Weld

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 19 Issue 3pp 242–262https://doi.org/10.1145/502115.502117

Published:01 July 2001Publication History

ACM Transactions on Information Systems

Abstract

The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as “who was the first American in space?” or “what is the second tallest mountain in the world?” Yet today's most advanced web search services (e.g., Google and AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend question-answering techniques, first studied in the information retrieval literature, to the web and experimentally evaluate their performance.First we introduce Mulder, which we believe to be the first general-purpose, fully-automated question-answering system available on the web. Second, we describe Mulder's architecture, which relies on multiple search-engine queries, natural-language parsing, and a novel voting procedure to yield reliable answers coupled with high recall. Finally, we compare Mulder's performance to that of Google and AskJeeves on questions drawn from the TREC-8 question answering track. We find that Mulder's recall is more than a factor of three higher than that of AskJeeves. In addition, we find that Google requires 6.6 times as much user effort to achieve the same level of recall as Mulder.

References

AKMAJIAN,A.AND HENY, F. 1975. An Introduction to the Principles of Transformational Syntax. MIT Press, Cambridge, Mass.Google Scholar
ANTWORTH, E. L. 1990. PC-KIMMO: A two-level processor for morphological analysis. Summer Institute of Linguistics, Dallas, Tex.Google Scholar
ARPA. 1998. Proceedings of the 7th Message Understanding Conference. Morgan Kaufmann, San Francisco, Calif.Google Scholar
BIKEL, D., MILLER, S., SCHWARTZ, R., AND WEISCHEDEL, R. 1997. Nymble: A high-performance learning name finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing (1997), 194-201. Google Scholar
BRIN,S.AND PAGE, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference (www-7, Brisborne, Australia, Apr. 14-18). Google Scholar
BUCKLEY, C., SALTON, G., ALLAN,J.,AND SINGHAL, A. 1995. Automatic query expansion using SMART: TREC 3. In NIST Special Publication 500-225: The Third Text REtrieval Conference (TREC-3) (1995), Department of Commerce, National Institute of Standards and Technology, 69-80.Google Scholar
BURKE, R., HAMMOND, K., KULYUKIN, V., LYTINEN, S., TOMURO,N.,AND SCHOENBERG, S. 1997. Question answering from frequently-asked question files: Experiences with the FAQ finder system. Tech. Rep. TR-97-05. Depart. of Computer Science, University of Chicago. Google Scholar
CHAKRABARTI, S., BERG,M,VAN DER., AND DOM, B. 1999. Focused crawling: a new approach to topicspecific Web resource discovery. In Proceedings of 8th International World Wide Web Conference (WWW8). Google Scholar
CHARNIAK, E. 1997. Statistical techniques for natural language parsing. AI Magazine 18,4 (Winter).Google Scholar
CHARNIAK, E. 1999. A Maximum-Entropy-Inspired Parser. Tech. Rep. CS-99-12 (Aug.), Brown University, Computer Science Dept. Google Scholar
CHAUDHRI,V.AND R. 1999. Question Answering Systems: Papers from the 1999 Fall Symposium. Technical Report FS-98-04 (November), AAAI.Google Scholar
CHOMSKY, N. 1965. Aspects of a Theory of Syntax. MIT Press, Cambridge, Mass.Google Scholar
COLLINS, M. J. 1996. A New Statistical Parser Based on Bigram Lexical Dependencies. In Proceedings of the 34th Annual Meeting of the ACL (Santa Cruz, Calif ). Google Scholar
ETZIONI, O. 1997. Moving up the information food chain: softbots as information carnivores. AI Maga., special issue, Summer 1997.Google Scholar
GRINBERG, D., LAFFERTY,J.,AND SLEATOR, D. 1995. ARobust Parsing Algorithm for Link Grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies (Prague, Sept.).Google Scholar
HARABAGIU, S., MAIORANO,S.,AND PASCA, M. 2000. Experiments with Open-Domain Textual Question Answering. In Proceedings of COLING-2000 (Saarbruken Germany, Aug.). Google Scholar
KATZ, B. 1997. From Sentence Processing to Information Access on the World Wide Web. In Natural Language Processing for the World Wide Web: Papers from the 1997 AAAI Spring Symposium, 77-94.Google Scholar
KUPIEC, J. 1993. MURAX: A Robust Linguistic Approach for Question Answering Using an On-Line Encyclopedia. In Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (Pittsburgh, Pa. June 27-July 1). R. Korfhage, E. M. Rasmussen, and P. Willett, Eds., ACM, New York, 181-190. Google Scholar
LITKOWSKI, K. 1999. Question-Answering Using Semantic Relation Triples. In Proceedings of the 8th Text Retrieval Conference (TREC-8). (National Institute of Standards and Technology, Gaithersburg MD), 349-356.Google Scholar
MARCUS,M.P.,MARCINKIEWICZ,M.A.,AND SANTORINI, B. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313-330. Google Scholar
MILLER, G. 1991. WordNet: An on-line lexical database. International Journal of Lexicography 3, 4, 235-312.Google Scholar
RADEV, D. R., PRAGER,J.,AND SAMN, V. 1999. The Use of Predictive Annotation for Question Answering in TREC8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). (National Institute of Standards and Technology, Gaithersburg MD), 399-411.Google Scholar
SNEIDERS, E. 1999. Automated FAQ Answering: Continued Experience with Shallow Language Understanding. In Question Answering Systems. Papers from the 1999 AAAI Fall Symposium.Google Scholar
SRIHARI,R.AND LI, W. 1999. Information Extraction Supported Question Answering. In Proceedings of the 8th Text Retrieval Conference (TREC-8). (National Institute of Standards and Technology, Gaithersburg MD), 185-196.Google Scholar
TAYLOR, S. E., FRANCKENPOHL, H., AND PETTE, J. L. 1960. Grade level norms for the component of the fundamental reading skill. EDL Information and Research Bulletin No. 3. Huntington, N.Y.Google Scholar
VOORHEES, E. 1994. Query expansion using lexical-semantic relations. In Proceedings of ACM SIGIR (Dublin, Ireland). Google Scholar
VOORHEES,E.AND TICE, D. 1999. The TREC-8 Question Answering Track Evaluation, pp. 77-82. Department of Commerce, National Institute of Standards and Technology.Google Scholar
VOORHEES,E.AND TICE, D. 2000. Building a question answering test collection. In Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. Google Scholar
WHITEHEAD, S. D. 1995. Auto-FAQ: An experiment in cyberspace leveraging. Computer Networks and ISDN Systems 28, 1-2 (Dec.), 137-146. Google Scholar
ZAMIR,O.AND ETZIONI, O. 1999. A Dynamic Clustering Interface to Web Search Results. In Proceedings of the Eighth Int. WWW Conference. Google Scholar

Index Terms

Scaling question answering to the web
1. Information systems
  1. Information systems applications

Recommendations

Probabilistic question answering on the web
WWW '02: Proceedings of the 11th international conference on World Wide Web

Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language ...
Read More
Question Answering System Based on Web
ICICTA '12: Proceedings of the 2012 Fifth International Conference on Intelligent Computation Technology and Automation

This paper summarizes the classification, implementation and evaluation of question answering system (QA). QA is divided into four categories: chat robot, QA based knowledge base, QA retrieval system and QA based on free text. Web QA system is composed ...
Read More
Research on Answer Extraction Method for Domain Question Answering System(QA)
CIS '09: Proceedings of the 2009 International Conference on Computational Intelligence and Security - Volume 01

The domain knowledge has a direct impact on the result of question - answering (Q & A) in the restricted domain Question Answering System (QA). In this paper, a method of answer extraction for domain Chinese question-and-answer (Q&A) is proposed, which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 19, Issue 3
July 2001
119 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/502115
Issue’s Table of Contents

Copyright © 2001 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2001
Published in tois Volume 19, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
answer extraction
answer selection
natural language processing
query formulation
search engines
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 163
  Total Citations
  View Citations
- 2,092
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scaling question answering to the web

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Probabilistic question answering on the web

Question Answering System Based on Web

Research on Answer Extraction Method for Domain Question Answering System(QA)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scaling question answering to the web

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Probabilistic question answering on the web

Question Answering System Based on Web

Research on Answer Extraction Method for Domain Question Answering System(QA)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media