research-article

Selecting good expansion terms for pseudo-relevance feedback

Authors:
Guihong Cao

University of Montreal, Montreal, PQ, Canada

University of Montreal, Montreal, PQ, Canada
View Profile

,
Jian-Yun Nie

University of Montreal, Montreal, PQ, Canada

University of Montreal, Montreal, PQ, Canada
View Profile

,
Jianfeng Gao

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Stephen Robertson

Microsoft Research at Cambridge, Cambridge, United Kngdm

Microsoft Research at Cambridge, Cambridge, United Kngdm
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 243–250https://doi.org/10.1145/1390334.1390377

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 243–250

ABSTRACT

Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality - many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.

References

Bai, J. Nie, J., Bouchard, H. and Cao, G. Using query contexts in information retrieval. In the Proceedings of SIGIR'2007, Armsterdam, Netherlands, 2007. Google ScholarDigital Library
Bishop, C. Patten recognition and machine learning. Springer, 2006. Google ScholarDigital Library
Dempster, A. , Laird, N. and Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. 39(1):1--38, 1977Google Scholar
Gao, J., Qi, H., Xia, X., and Nie, J. Linear discriminant model for information retrieval. In the Proceedings of SIGIR'2005, pp. 290--297, 2005. Google ScholarDigital Library
Hsu, C. Chang, C. and Lin, C. A practical guide to support vector classification. Technical Report, National Taiwan University.Google Scholar
Joachims, T. Text categorization with support vector machines: learning with features. In ECML, pp.137--142, 1998. Google ScholarDigital Library
Kwok, K.L, Grunfeld, L., Chan, K., THREC-8 ad-hoc, query and filtering track experiments using PIRCS, In TREC10, 2000.Google Scholar
Lavrenko, V. and Croft, B. Relevance-based language models. In the Proceedings of SIGIR'2001, pp.120--128, 2001. Google ScholarDigital Library
Metzler, D. and Croft, B. Latent Concept Expansion Using Markov Random Fields. In the Proceedings of SIGIR'2007, pp.311--318. Google ScholarDigital Library
Nocedal, J. and Wright, S. Numerical optimization. Springer, 2006.Google Scholar
Peat, H.J. and Willett, P., The limitations of term co-occurrence data for query expansion in document retrieval systems. JASIS, 42(5): 378--383, 1991.Google ScholarCross Ref
Platt, J. Probabilities for SV Machines. Advances in large margin classifiers, pages 61--74, Cambridge, MA, 2000. MIT PressGoogle Scholar
Robertson, S., and Sparck Jones, K. Relevance weighting of search terms. JASIST, 27:129--146, 1976Google ScholarCross Ref
Robertson, S.E., On term selection for query expansion, Journal of Documentation, 46(4): 359--364. 1990. Google ScholarDigital Library
Rocchio, J. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323, 1971Google Scholar
Smeaton, A. F. and Van Rijsbergen, C. J. The retrieval effects of query expansion on a feedback document retrieval system. Computer Journal, 26(3): 239--246. 1983.Google ScholarCross Ref
Strohman, T., Metzler, D. and Turtle, H., and Croft, B. (2004). Indri: A Language Model-based Search Engine for Complex Queries. In Proceedings of the International conference on Intelligence Analysis.Google Scholar
Tao, T. and Zhai, C. An exploration of proximity measures information retrieval. In the Proceedings of SIGIR'2007, pp.295--302, 2007. Google ScholarDigital Library
Tao, T. and Zhai, C. Regularized estimation of mixture models for robust pseudo-relevance feedback. In the Proceedings of SIGIR'2006. Google ScholarDigital Library
Vapnik, V. Statistical Learning Theory. New York: Wiley, 1998Google ScholarDigital Library
Xu, J. and Croft, B. Query expansion using local and global document analysis. In the Proceedings of SIGIR'2006, pp.4--11, 1996. Google ScholarDigital Library
Zhai, C. and Lafferty, J. Model-based feedback in the KL-divergence retrieval model. In CIKM, pp.403--410, 2001a. Google ScholarDigital Library
Zhai, C. and Lafferty, J. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'2001, pp.334--342, 2001b. Google ScholarDigital Library
Zhang, Y., Callan, J., The bais problem and language models in adaptive filtering. In the Proceedings of TREC11, pp.78--83, 2001Google Scholar

Index Terms

Selecting good expansion terms for pseudo-relevance feedback
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Read More
Pseudo-Relevance Feedback Based on Locally-Built Co-occurrence Graphs
Advances in Databases and Information Systems
Abstract
In Information Retrieval (IR), user queries are often too short, making the selection of relevant documents hard. Pseudo-relevance feedback (PRF) is an effective method to automatically expand the query with new terms using a set of pseudo-...
Read More
Selecting Good Expansion Terms for Improving XML Retrieval Performance
ICCECT '12: Proceedings of the 2012 International Conference on Control Engineering and Communication Technology

In this paper, we study how to perform XML query expansion effectively from the high quality pseudo-relevance documents. A solution for selecting good expansion information is presented, in which various features impacting weight, such as term element ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Pseudo-relevance feedback
SVM
expansion term classification
language models
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 288
  Total Citations
  View Citations
- 2,678
  Total Downloads
- Downloads (Last 12 months)68
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Selecting good expansion terms for pseudo-relevance feedback

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Query dependent pseudo-relevance feedback based on wikipedia

Pseudo-Relevance Feedback Based on Locally-Built Co-occurrence Graphs

Selecting Good Expansion Terms for Improving XML Retrieval Performance