ABSTRACT
There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples' questions. These services rapidly build up large archives of questions and answers, and these archives are a valuable linguistic resource. One of the major tasks in a question and answer service is to find questions in the archive that a semantically similar to a user's question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model. We show that with this model it is possible to find semantically similar questions with relatively little word overlap.
- D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 407--416, 2000. Google ScholarDigital Library
- A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 192--199, 2000. Google ScholarDigital Library
- A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 222--229, 1999. Google ScholarDigital Library
- P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263--311, 1993. Google ScholarDigital Library
- R. D. Burke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. Technical report, 1997. Google ScholarDigital Library
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
- C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarCross Ref
- T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99, pages 289--296, 1999. Google ScholarDigital Library
- J. Jeon, W. B. Croft, and J. H. Lee. Finding semantically similar questions based on their answers. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 617--618, 2005. Google ScholarDigital Library
- Y.-S. Lai, K.-A. Fung, and C.-H. Wu. Faq mining via list detection. In Proceedings of the Workshop on Multilingual Summarization and Question Answering, 2002. Google ScholarDigital Library
- V. Lavrenko, M. Choquette, and W. B. Croft. Cross-lingual relevance models. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 175--182, 2002. Google ScholarDigital Library
- V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127, 2001. Google ScholarDigital Library
- C. Manning and H. Schutze. Foundation of statistical natural language processing. The MIT Press, 1999. Google ScholarDigital Library
- D. Metzler and W. B. Croft. Analysis of statistical question classification for fact-based questions. Information Retrieval, 8(3):481--504, 2005. Google ScholarDigital Library
- V. Murdock and W. B. Croft. Simple translation models for passage retrieval in factoid question answering. In Proceedings of the Workshop on Information Retrieval for Question Answering, 2004.Google Scholar
- M. A. Pasca and S. M. Harabagiu. High performance question/answering. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 366--374, 2001. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, 1998. Google ScholarDigital Library
- J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Indexing, pages 324--336. Prentice Hall, 1971.Google Scholar
- E. Sneiders. Automated question answering using question templates that cover the conceptual model of the database. In Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, pages 235--239, 2002. Google ScholarDigital Library
- A. Tombros, R. Villa, and C. J. V. Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Inf. Process. Manage., 38(4):559--582, 2002. Google ScholarDigital Library
- E. M. Voorhees. Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 61--69, 1994. Google ScholarDigital Library
- E. M. Voorhees. Overview of the TREC 2004 question answering track. In Proceedings of the Thirteenth Text Retrieval Conference, 2004.Google Scholar
- J. R. Wen, J. Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20(1):59--81, 2002. Google ScholarDigital Library
- J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996. Google ScholarDigital Library
Index Terms
- Finding similar questions in large question and answer archives
Recommendations
FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalFrequently Asked Question (FAQ) retrieval is an important task where the objective is to retrieve an appropriate Question-Answer (QA) pair from a database based on a user's query. We propose a FAQ retrieval system that considers the similarity between a ...
Finding semantically similar questions based on their answers
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalA large number of question and answer pairs can be collected from question and answer boards and FAQ pages on the Web. This paper proposes an automatic method of finding the questions that have the same meaning. The method can detect semantically ...
Retrieval models for question and answer archives
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalRetrieval in a question and answer archive involves finding good answers for a user's question. In contrast to typical document retrieval, a retrieval model for this task can exploit question similarity as well as ranking the associated answers. In this ...
Comments