skip to main content
article

Scaling question answering to the web

Published:01 July 2001Publication History
Skip Abstract Section

Abstract

The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as “who was the first American in space?” or “what is the second tallest mountain in the world?” Yet today's most advanced web search services (e.g., Google and AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend question-answering techniques, first studied in the information retrieval literature, to the web and experimentally evaluate their performance.First we introduce Mulder, which we believe to be the first general-purpose, fully-automated question-answering system available on the web. Second, we describe Mulder's architecture, which relies on multiple search-engine queries, natural-language parsing, and a novel voting procedure to yield reliable answers coupled with high recall. Finally, we compare Mulder's performance to that of Google and AskJeeves on questions drawn from the TREC-8 question answering track. We find that Mulder's recall is more than a factor of three higher than that of AskJeeves. In addition, we find that Google requires 6.6 times as much user effort to achieve the same level of recall as Mulder.

References

  1. AKMAJIAN,A.AND HENY, F. 1975. An Introduction to the Principles of Transformational Syntax. MIT Press, Cambridge, Mass.Google ScholarGoogle Scholar
  2. ANTWORTH, E. L. 1990. PC-KIMMO: A two-level processor for morphological analysis. Summer Institute of Linguistics, Dallas, Tex.Google ScholarGoogle Scholar
  3. ARPA. 1998. Proceedings of the 7th Message Understanding Conference. Morgan Kaufmann, San Francisco, Calif.Google ScholarGoogle Scholar
  4. BIKEL, D., MILLER, S., SCHWARTZ, R., AND WEISCHEDEL, R. 1997. Nymble: A high-performance learning name finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing (1997), 194-201. Google ScholarGoogle Scholar
  5. BRIN,S.AND PAGE, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference (www-7, Brisborne, Australia, Apr. 14-18). Google ScholarGoogle Scholar
  6. BUCKLEY, C., SALTON, G., ALLAN,J.,AND SINGHAL, A. 1995. Automatic query expansion using SMART: TREC 3. In NIST Special Publication 500-225: The Third Text REtrieval Conference (TREC-3) (1995), Department of Commerce, National Institute of Standards and Technology, 69-80.Google ScholarGoogle Scholar
  7. BURKE, R., HAMMOND, K., KULYUKIN, V., LYTINEN, S., TOMURO,N.,AND SCHOENBERG, S. 1997. Question answering from frequently-asked question files: Experiences with the FAQ finder system. Tech. Rep. TR-97-05. Depart. of Computer Science, University of Chicago. Google ScholarGoogle Scholar
  8. CHAKRABARTI, S., BERG,M,VAN DER., AND DOM, B. 1999. Focused crawling: a new approach to topicspecific Web resource discovery. In Proceedings of 8th International World Wide Web Conference (WWW8). Google ScholarGoogle Scholar
  9. CHARNIAK, E. 1997. Statistical techniques for natural language parsing. AI Magazine 18,4 (Winter).Google ScholarGoogle Scholar
  10. CHARNIAK, E. 1999. A Maximum-Entropy-Inspired Parser. Tech. Rep. CS-99-12 (Aug.), Brown University, Computer Science Dept. Google ScholarGoogle Scholar
  11. CHAUDHRI,V.AND R. 1999. Question Answering Systems: Papers from the 1999 Fall Symposium. Technical Report FS-98-04 (November), AAAI.Google ScholarGoogle Scholar
  12. CHOMSKY, N. 1965. Aspects of a Theory of Syntax. MIT Press, Cambridge, Mass.Google ScholarGoogle Scholar
  13. COLLINS, M. J. 1996. A New Statistical Parser Based on Bigram Lexical Dependencies. In Proceedings of the 34th Annual Meeting of the ACL (Santa Cruz, Calif ). Google ScholarGoogle Scholar
  14. ETZIONI, O. 1997. Moving up the information food chain: softbots as information carnivores. AI Maga., special issue, Summer 1997.Google ScholarGoogle Scholar
  15. GRINBERG, D., LAFFERTY,J.,AND SLEATOR, D. 1995. ARobust Parsing Algorithm for Link Grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies (Prague, Sept.).Google ScholarGoogle Scholar
  16. HARABAGIU, S., MAIORANO,S.,AND PASCA, M. 2000. Experiments with Open-Domain Textual Question Answering. In Proceedings of COLING-2000 (Saarbruken Germany, Aug.). Google ScholarGoogle Scholar
  17. KATZ, B. 1997. From Sentence Processing to Information Access on the World Wide Web. In Natural Language Processing for the World Wide Web: Papers from the 1997 AAAI Spring Symposium, 77-94.Google ScholarGoogle Scholar
  18. KUPIEC, J. 1993. MURAX: A Robust Linguistic Approach for Question Answering Using an On-Line Encyclopedia. In Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (Pittsburgh, Pa. June 27-July 1). R. Korfhage, E. M. Rasmussen, and P. Willett, Eds., ACM, New York, 181-190. Google ScholarGoogle Scholar
  19. LITKOWSKI, K. 1999. Question-Answering Using Semantic Relation Triples. In Proceedings of the 8th Text Retrieval Conference (TREC-8). (National Institute of Standards and Technology, Gaithersburg MD), 349-356.Google ScholarGoogle Scholar
  20. MARCUS,M.P.,MARCINKIEWICZ,M.A.,AND SANTORINI, B. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313-330. Google ScholarGoogle Scholar
  21. MILLER, G. 1991. WordNet: An on-line lexical database. International Journal of Lexicography 3, 4, 235-312.Google ScholarGoogle Scholar
  22. RADEV, D. R., PRAGER,J.,AND SAMN, V. 1999. The Use of Predictive Annotation for Question Answering in TREC8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). (National Institute of Standards and Technology, Gaithersburg MD), 399-411.Google ScholarGoogle Scholar
  23. SNEIDERS, E. 1999. Automated FAQ Answering: Continued Experience with Shallow Language Understanding. In Question Answering Systems. Papers from the 1999 AAAI Fall Symposium.Google ScholarGoogle Scholar
  24. SRIHARI,R.AND LI, W. 1999. Information Extraction Supported Question Answering. In Proceedings of the 8th Text Retrieval Conference (TREC-8). (National Institute of Standards and Technology, Gaithersburg MD), 185-196.Google ScholarGoogle Scholar
  25. TAYLOR, S. E., FRANCKENPOHL, H., AND PETTE, J. L. 1960. Grade level norms for the component of the fundamental reading skill. EDL Information and Research Bulletin No. 3. Huntington, N.Y.Google ScholarGoogle Scholar
  26. VOORHEES, E. 1994. Query expansion using lexical-semantic relations. In Proceedings of ACM SIGIR (Dublin, Ireland). Google ScholarGoogle Scholar
  27. VOORHEES,E.AND TICE, D. 1999. The TREC-8 Question Answering Track Evaluation, pp. 77-82. Department of Commerce, National Institute of Standards and Technology.Google ScholarGoogle Scholar
  28. VOORHEES,E.AND TICE, D. 2000. Building a question answering test collection. In Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. Google ScholarGoogle Scholar
  29. WHITEHEAD, S. D. 1995. Auto-FAQ: An experiment in cyberspace leveraging. Computer Networks and ISDN Systems 28, 1-2 (Dec.), 137-146. Google ScholarGoogle Scholar
  30. ZAMIR,O.AND ETZIONI, O. 1999. A Dynamic Clustering Interface to Web Search Results. In Proceedings of the Eighth Int. WWW Conference. Google ScholarGoogle Scholar

Index Terms

  1. Scaling question answering to the web

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader