skip to main content
10.1145/1099554.1099572acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Finding similar questions in large question and answer archives

Published:31 October 2005Publication History

ABSTRACT

There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples' questions. These services rapidly build up large archives of questions and answers, and these archives are a valuable linguistic resource. One of the major tasks in a question and answer service is to find questions in the archive that a semantically similar to a user's question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model. We show that with this model it is possible to find semantically similar questions with relatively little word overlap.

References

  1. D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 407--416, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 192--199, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 222--229, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263--311, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Burke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. Technical report, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  8. T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99, pages 289--296, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Jeon, W. B. Croft, and J. H. Lee. Finding semantically similar questions based on their answers. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 617--618, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y.-S. Lai, K.-A. Fung, and C.-H. Wu. Faq mining via list detection. In Proceedings of the Workshop on Multilingual Summarization and Question Answering, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Lavrenko, M. Choquette, and W. B. Croft. Cross-lingual relevance models. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 175--182, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Manning and H. Schutze. Foundation of statistical natural language processing. The MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Metzler and W. B. Croft. Analysis of statistical question classification for fact-based questions. Information Retrieval, 8(3):481--504, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Murdock and W. B. Croft. Simple translation models for passage retrieval in factoid question answering. In Proceedings of the Workshop on Information Retrieval for Question Answering, 2004.Google ScholarGoogle Scholar
  16. M. A. Pasca and S. M. Harabagiu. High performance question/answering. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 366--374, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Indexing, pages 324--336. Prentice Hall, 1971.Google ScholarGoogle Scholar
  19. E. Sneiders. Automated question answering using question templates that cover the conceptual model of the database. In Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, pages 235--239, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Tombros, R. Villa, and C. J. V. Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Inf. Process. Manage., 38(4):559--582, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. M. Voorhees. Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 61--69, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. M. Voorhees. Overview of the TREC 2004 question answering track. In Proceedings of the Thirteenth Text Retrieval Conference, 2004.Google ScholarGoogle Scholar
  23. J. R. Wen, J. Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20(1):59--81, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Finding similar questions in large question and answer archives

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
        October 2005
        854 pages
        ISBN:1595931406
        DOI:10.1145/1099554

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 October 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        CIKM '05 Paper Acceptance Rate77of425submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader