Article

Mining the web for answers to natural language questions

Authors:
Dragomir R. Radev

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Hong Qi

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Zhiping Zheng

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Sasha Blair-Goldensohn

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Zhu Zhang

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Weiguo Fan

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
John Prager

IBM TJ Watson Research Center, Hawthorne, NY

IBM TJ Watson Research Center, Hawthorne, NY
View Profile

CIKM '01: Proceedings of the tenth international conference on Information and knowledge managementOctober 2001Pages 143–150https://doi.org/10.1145/502585.502610

Published:05 October 2001Publication History

CIKM '01: Proceedings of the tenth international conference on Information and knowledge management

Pages 143–150

ABSTRACT

The web is now becoming one of the largest information and knowledge repositories. Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to help users find information. In this paper, we study how we can effectively use these existing search engines to mine the Web and discover the "correct" answers to factual natural language questions.We propose a probabilistic algorithm called QASM (Question Answering using Statistical Models) that learns the best query paraphrase of a natural language question. We validate our approach for both local and web search engines using questions from the TREC evaluation. We also show how this algorithm can be combined with another algorithm (AnSel) to produce precise answers to natural language questions.

References

1.The Fast search engine. http://www.alltheweb.com, 2001.Google Scholar
2.M. Banko, V Mittal, and M. Witbrock. Headline generation based on statistical translation. In Proceedings ofACL-2000, 2000. Google ScholarDigital Library
3.A. Berger, P. Brown, S. Pietra, V. Pietra, J. Lafferty, H. Printz, and L. Ures. The candide system for machine translation. In Proceedings of the ARPA Conference on Human Language Technology, 1994., 1994. Google ScholarDigital Library
4.A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, August 1999. Google ScholarDigital Library
5.P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85, 1990. Google ScholarDigital Library
6.P. F. Brown, V J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 1993. Google ScholarDigital Library
7.K. Church. A stochastic parts program and a noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988. Google ScholarDigital Library
8.A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society series B, 39: l-38, 1977.Google Scholar
9.The Excite query corpus. ftp:Nftp.excite.comlpub/jack/Excite-Log-l2201999.gz, 1999.Google Scholar
10.E. J. Glover, S. Lawrence, M. D. Gordon, W. P. Birmingham, and C. L. Giles. Web search -your way. Communications of the ACM, 2001. Google ScholarDigital Library
11.S. Harabagiu, D. Moldovan, M. Pasta, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Girju, V. Rus, and P. Morarescu. The TREC-9 question answering track evaluation. In Text Retrieval Conference TREC-9, Gaithersburg, MD, 200 1.Google Scholar
12.F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts, 1997. Google ScholarDigital Library
13.K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4), 1998. Google ScholarDigital Library
14.K. Knight and D. Marcu. Statistics-based summarization -step one: sentence compression. In Proceedings, Seventeenth Annual Conference of the American Association for ArtiJicial Intelligence, Austin, Texas, August 2000. Google ScholarDigital Library
15.C. Manning and H. Schiitze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarDigital Library
16.A. Mikheev. Tagging sentence boundaries. In Proceedings, SIGIR 2000,200O.Google Scholar
17.G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235-312, 1990.Google ScholarCross Ref
18.M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proc. SIGIR98, Melbourne (AU), 1998. Google ScholarDigital Library
19.F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In 30th Annual Meeting of the ACL, pages 183-190, 1993. Google ScholarDigital Library
20.J. Ponte and B. Croft. A language modeling approach to information retrieval. In Proceedings, 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275-28 1, Melbourne, Australia, August 1998. Google ScholarDigital Library
21.J. Prager, E. Brown, A. Coden, and D. Radev. Question-answering by predictive annotation. In Proceedings, 23rd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000. Google ScholarDigital Library
22.D. R. Radev, K. Libner, and W. Fan. An empirical evaluation of the capability of state-of-the-art search engines to answer natural language questions. Submitted, 2001.Google Scholar
23.D. R. Radev, J. Prager, and V Samn. Ranking potential answers to natural language questions. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, May 2000. Google ScholarDigital Library
24.E. Voorhees and D. Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.Google ScholarCross Ref

Index Terms

Mining the web for answers to natural language questions
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
  2. Information storage systems

Recommendations

Getting answers to natural language questions on the web

Seven hundred natural language questions from TREC-8 and TREC-9 were sent by Radev, Libner, and Fan to each of nine web search engines. The top 40 sites returned by each system were stored for evaluation of their productivity of correct answers. Each ...
Read More
Learning to find answers to questions on the Web

We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from ...
Read More
Detecting Intent of Web Queries Using Questions and Answers in CQA Corpus
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Detecting intent in Web search activity is important task for finding relevant Web information. However extracting intents from users' queries is difficult as users express their intent by issuing short and often ambiguous queries, yet at the same time ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Editors:
Henrique Paques
Georgia Institute of Technology
,
Ling Liu
Georgia Institute of Technology
,
David Grossman
Illinois Institute of Technology
,
General Chair:
Calton Pu
Georgia Institute of Technology
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 October 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 1,246
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining the web for answers to natural language questions

CIKM '01: Proceedings of the tenth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Getting answers to natural language questions on the web

Learning to find answers to questions on the Web

Detecting Intent of Web Queries Using Questions and Answers in CQA Corpus