skip to main content
10.1145/502585.502610acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Mining the web for answers to natural language questions

Published:05 October 2001Publication History

ABSTRACT

The web is now becoming one of the largest information and knowledge repositories. Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to help users find information. In this paper, we study how we can effectively use these existing search engines to mine the Web and discover the "correct" answers to factual natural language questions.We propose a probabilistic algorithm called QASM (Question Answering using Statistical Models) that learns the best query paraphrase of a natural language question. We validate our approach for both local and web search engines using questions from the TREC evaluation. We also show how this algorithm can be combined with another algorithm (AnSel) to produce precise answers to natural language questions.

References

  1. 1.The Fast search engine. http://www.alltheweb.com, 2001.Google ScholarGoogle Scholar
  2. 2.M. Banko, V Mittal, and M. Witbrock. Headline generation based on statistical translation. In Proceedings ofACL-2000, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.A. Berger, P. Brown, S. Pietra, V. Pietra, J. Lafferty, H. Printz, and L. Ures. The candide system for machine translation. In Proceedings of the ARPA Conference on Human Language Technology, 1994., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, August 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.P. F. Brown, V J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.K. Church. A stochastic parts program and a noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society series B, 39: l-38, 1977.Google ScholarGoogle Scholar
  9. 9.The Excite query corpus. ftp:Nftp.excite.comlpub/jack/Excite-Log-l2201999.gz, 1999.Google ScholarGoogle Scholar
  10. 10.E. J. Glover, S. Lawrence, M. D. Gordon, W. P. Birmingham, and C. L. Giles. Web search -your way. Communications of the ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.S. Harabagiu, D. Moldovan, M. Pasta, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Girju, V. Rus, and P. Morarescu. The TREC-9 question answering track evaluation. In Text Retrieval Conference TREC-9, Gaithersburg, MD, 200 1.Google ScholarGoogle Scholar
  12. 12.F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.K. Knight and D. Marcu. Statistics-based summarization -step one: sentence compression. In Proceedings, Seventeenth Annual Conference of the American Association for ArtiJicial Intelligence, Austin, Texas, August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.C. Manning and H. Schiitze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.A. Mikheev. Tagging sentence boundaries. In Proceedings, SIGIR 2000,200O.Google ScholarGoogle Scholar
  17. 17.G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235-312, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  18. 18.M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proc. SIGIR98, Melbourne (AU), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In 30th Annual Meeting of the ACL, pages 183-190, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.J. Ponte and B. Croft. A language modeling approach to information retrieval. In Proceedings, 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275-28 1, Melbourne, Australia, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.J. Prager, E. Brown, A. Coden, and D. Radev. Question-answering by predictive annotation. In Proceedings, 23rd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.D. R. Radev, K. Libner, and W. Fan. An empirical evaluation of the capability of state-of-the-art search engines to answer natural language questions. Submitted, 2001.Google ScholarGoogle Scholar
  23. 23.D. R. Radev, J. Prager, and V Samn. Ranking potential answers to natural language questions. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.E. Voorhees and D. Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Mining the web for answers to natural language questions

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
          October 2001
          616 pages
          ISBN:1581134363
          DOI:10.1145/502585

          Copyright © 2001 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 October 2001

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader