Article

Finding similar questions in large question and answer archives

Authors:
Jiwoon Jeon

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
W. Bruce Croft

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
Joon Ho Lee

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementOctober 2005Pages 84–90https://doi.org/10.1145/1099554.1099572

Published:31 October 2005Publication History

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Pages 84–90

ABSTRACT

There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples' questions. These services rapidly build up large archives of questions and answers, and these archives are a valuable linguistic resource. One of the major tasks in a question and answer service is to find questions in the archive that a semantically similar to a user's question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model. We show that with this model it is possible to find semantically similar questions with relatively little word overlap.

References

D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 407--416, 2000. Google ScholarDigital Library
A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 192--199, 2000. Google ScholarDigital Library
A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 222--229, 1999. Google ScholarDigital Library
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263--311, 1993. Google ScholarDigital Library
R. D. Burke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. Technical report, 1997. Google ScholarDigital Library
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarCross Ref
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99, pages 289--296, 1999. Google ScholarDigital Library
J. Jeon, W. B. Croft, and J. H. Lee. Finding semantically similar questions based on their answers. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 617--618, 2005. Google ScholarDigital Library
Y.-S. Lai, K.-A. Fung, and C.-H. Wu. Faq mining via list detection. In Proceedings of the Workshop on Multilingual Summarization and Question Answering, 2002. Google ScholarDigital Library
V. Lavrenko, M. Choquette, and W. B. Croft. Cross-lingual relevance models. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 175--182, 2002. Google ScholarDigital Library
V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127, 2001. Google ScholarDigital Library
C. Manning and H. Schutze. Foundation of statistical natural language processing. The MIT Press, 1999. Google ScholarDigital Library
D. Metzler and W. B. Croft. Analysis of statistical question classification for fact-based questions. Information Retrieval, 8(3):481--504, 2005. Google ScholarDigital Library
V. Murdock and W. B. Croft. Simple translation models for passage retrieval in factoid question answering. In Proceedings of the Workshop on Information Retrieval for Question Answering, 2004.Google Scholar
M. A. Pasca and S. M. Harabagiu. High performance question/answering. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 366--374, 2001. Google ScholarDigital Library
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, 1998. Google ScholarDigital Library
J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Indexing, pages 324--336. Prentice Hall, 1971.Google Scholar
E. Sneiders. Automated question answering using question templates that cover the conceptual model of the database. In Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, pages 235--239, 2002. Google ScholarDigital Library
A. Tombros, R. Villa, and C. J. V. Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Inf. Process. Manage., 38(4):559--582, 2002. Google ScholarDigital Library
E. M. Voorhees. Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 61--69, 1994. Google ScholarDigital Library
E. M. Voorhees. Overview of the TREC 2004 question answering track. In Proceedings of the Thirteenth Text Retrieval Conference, 2004.Google Scholar
J. R. Wen, J. Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20(1):59--81, 2002. Google ScholarDigital Library
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996. Google ScholarDigital Library

Index Terms

Finding similar questions in large question and answer archives
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Frequently Asked Question (FAQ) retrieval is an important task where the objective is to retrieve an appropriate Question-Answer (QA) pair from a database based on a user's query. We propose a FAQ retrieval system that considers the similarity between a ...
Read More
Finding semantically similar questions based on their answers
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

A large number of question and answer pairs can be collected from question and answer boards and FAQ pages on the Web. This paper proposes an automatic method of finding the questions that have the same meaning. The method can detect semantically ...
Read More
Retrieval models for question and answer archives
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Retrieval in a question and answer archive involves finding good answers for a user's question. In contrast to typical document retrieval, a retrieval model for this task can exploit question similarity as well as ranking the associated answers. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
October 2005
854 pages
ISBN:1595931406
DOI:10.1145/1099554
General Chair:
Otthein Herzog
University of Bremen, Germany
,
Program Chairs:
Hans-Jörg Schek
University for Health Sciences, Medical Informatics and Technology, Austria
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Abdur Chowdhury
America Online, USA
,
Wilfried Teiken
IBM T.J. Watson Research Center, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 October 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FAQ retrieval
information retrieval
language models
Qualifiers
- Article
Conference

Acceptance Rates
CIKM '05 Paper Acceptance Rate77of425submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 259
  Total Citations
  View Citations
- 1,991
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Finding similar questions in large question and answer archives

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance

Finding semantically similar questions based on their answers

Retrieval models for question and answer archives

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Finding similar questions in large question and answer archives

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance

Finding semantically similar questions based on their answers

Retrieval models for question and answer archives

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media