ABSTRACT
The web is now becoming one of the largest information and knowledge repositories. Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to help users find information. In this paper, we study how we can effectively use these existing search engines to mine the Web and discover the "correct" answers to factual natural language questions.We propose a probabilistic algorithm called QASM (Question Answering using Statistical Models) that learns the best query paraphrase of a natural language question. We validate our approach for both local and web search engines using questions from the TREC evaluation. We also show how this algorithm can be combined with another algorithm (AnSel) to produce precise answers to natural language questions.
- 1.The Fast search engine. http://www.alltheweb.com, 2001.Google Scholar
- 2.M. Banko, V Mittal, and M. Witbrock. Headline generation based on statistical translation. In Proceedings ofACL-2000, 2000. Google ScholarDigital Library
- 3.A. Berger, P. Brown, S. Pietra, V. Pietra, J. Lafferty, H. Printz, and L. Ures. The candide system for machine translation. In Proceedings of the ARPA Conference on Human Language Technology, 1994., 1994. Google ScholarDigital Library
- 4.A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, August 1999. Google ScholarDigital Library
- 5.P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85, 1990. Google ScholarDigital Library
- 6.P. F. Brown, V J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 1993. Google ScholarDigital Library
- 7.K. Church. A stochastic parts program and a noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988. Google ScholarDigital Library
- 8.A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society series B, 39: l-38, 1977.Google Scholar
- 9.The Excite query corpus. ftp:Nftp.excite.comlpub/jack/Excite-Log-l2201999.gz, 1999.Google Scholar
- 10.E. J. Glover, S. Lawrence, M. D. Gordon, W. P. Birmingham, and C. L. Giles. Web search -your way. Communications of the ACM, 2001. Google ScholarDigital Library
- 11.S. Harabagiu, D. Moldovan, M. Pasta, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Girju, V. Rus, and P. Morarescu. The TREC-9 question answering track evaluation. In Text Retrieval Conference TREC-9, Gaithersburg, MD, 200 1.Google Scholar
- 12.F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts, 1997. Google ScholarDigital Library
- 13.K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4), 1998. Google ScholarDigital Library
- 14.K. Knight and D. Marcu. Statistics-based summarization -step one: sentence compression. In Proceedings, Seventeenth Annual Conference of the American Association for ArtiJicial Intelligence, Austin, Texas, August 2000. Google ScholarDigital Library
- 15.C. Manning and H. Schiitze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarDigital Library
- 16.A. Mikheev. Tagging sentence boundaries. In Proceedings, SIGIR 2000,200O.Google Scholar
- 17.G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235-312, 1990.Google ScholarCross Ref
- 18.M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proc. SIGIR98, Melbourne (AU), 1998. Google ScholarDigital Library
- 19.F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In 30th Annual Meeting of the ACL, pages 183-190, 1993. Google ScholarDigital Library
- 20.J. Ponte and B. Croft. A language modeling approach to information retrieval. In Proceedings, 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275-28 1, Melbourne, Australia, August 1998. Google ScholarDigital Library
- 21.J. Prager, E. Brown, A. Coden, and D. Radev. Question-answering by predictive annotation. In Proceedings, 23rd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000. Google ScholarDigital Library
- 22.D. R. Radev, K. Libner, and W. Fan. An empirical evaluation of the capability of state-of-the-art search engines to answer natural language questions. Submitted, 2001.Google Scholar
- 23.D. R. Radev, J. Prager, and V Samn. Ranking potential answers to natural language questions. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, May 2000. Google ScholarDigital Library
- 24.E. Voorhees and D. Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.Google ScholarCross Ref
Index Terms
- Mining the web for answers to natural language questions
Recommendations
Getting answers to natural language questions on the web
Seven hundred natural language questions from TREC-8 and TREC-9 were sent by Radev, Libner, and Fan to each of nine web search engines. The top 40 sites returned by each system were stored for evaluation of their productivity of correct answers. Each ...
Learning to find answers to questions on the Web
We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from ...
Detecting Intent of Web Queries Using Questions and Answers in CQA Corpus
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Detecting intent in Web search activity is important task for finding relevant Web information. However extracting intents from users' queries is difficult as users express their intent by issuing short and often ambiguous queries, yet at the same time ...
Comments