ABSTRACT
Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality - many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.
- Bai, J. Nie, J., Bouchard, H. and Cao, G. Using query contexts in information retrieval. In the Proceedings of SIGIR'2007, Armsterdam, Netherlands, 2007. Google ScholarDigital Library
- Bishop, C. Patten recognition and machine learning. Springer, 2006. Google ScholarDigital Library
- Dempster, A. , Laird, N. and Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. 39(1):1--38, 1977Google Scholar
- Gao, J., Qi, H., Xia, X., and Nie, J. Linear discriminant model for information retrieval. In the Proceedings of SIGIR'2005, pp. 290--297, 2005. Google ScholarDigital Library
- Hsu, C. Chang, C. and Lin, C. A practical guide to support vector classification. Technical Report, National Taiwan University.Google Scholar
- Joachims, T. Text categorization with support vector machines: learning with features. In ECML, pp.137--142, 1998. Google ScholarDigital Library
- Kwok, K.L, Grunfeld, L., Chan, K., THREC-8 ad-hoc, query and filtering track experiments using PIRCS, In TREC10, 2000.Google Scholar
- Lavrenko, V. and Croft, B. Relevance-based language models. In the Proceedings of SIGIR'2001, pp.120--128, 2001. Google ScholarDigital Library
- Metzler, D. and Croft, B. Latent Concept Expansion Using Markov Random Fields. In the Proceedings of SIGIR'2007, pp.311--318. Google ScholarDigital Library
- Nocedal, J. and Wright, S. Numerical optimization. Springer, 2006.Google Scholar
- Peat, H.J. and Willett, P., The limitations of term co-occurrence data for query expansion in document retrieval systems. JASIS, 42(5): 378--383, 1991.Google ScholarCross Ref
- Platt, J. Probabilities for SV Machines. Advances in large margin classifiers, pages 61--74, Cambridge, MA, 2000. MIT PressGoogle Scholar
- Robertson, S., and Sparck Jones, K. Relevance weighting of search terms. JASIST, 27:129--146, 1976Google ScholarCross Ref
- Robertson, S.E., On term selection for query expansion, Journal of Documentation, 46(4): 359--364. 1990. Google ScholarDigital Library
- Rocchio, J. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323, 1971Google Scholar
- Smeaton, A. F. and Van Rijsbergen, C. J. The retrieval effects of query expansion on a feedback document retrieval system. Computer Journal, 26(3): 239--246. 1983.Google ScholarCross Ref
- Strohman, T., Metzler, D. and Turtle, H., and Croft, B. (2004). Indri: A Language Model-based Search Engine for Complex Queries. In Proceedings of the International conference on Intelligence Analysis.Google Scholar
- Tao, T. and Zhai, C. An exploration of proximity measures information retrieval. In the Proceedings of SIGIR'2007, pp.295--302, 2007. Google ScholarDigital Library
- Tao, T. and Zhai, C. Regularized estimation of mixture models for robust pseudo-relevance feedback. In the Proceedings of SIGIR'2006. Google ScholarDigital Library
- Vapnik, V. Statistical Learning Theory. New York: Wiley, 1998Google ScholarDigital Library
- Xu, J. and Croft, B. Query expansion using local and global document analysis. In the Proceedings of SIGIR'2006, pp.4--11, 1996. Google ScholarDigital Library
- Zhai, C. and Lafferty, J. Model-based feedback in the KL-divergence retrieval model. In CIKM, pp.403--410, 2001a. Google ScholarDigital Library
- Zhai, C. and Lafferty, J. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'2001, pp.334--342, 2001b. Google ScholarDigital Library
- Zhang, Y., Callan, J., The bais problem and language models in adaptive filtering. In the Proceedings of TREC11, pp.78--83, 2001Google Scholar
Index Terms
- Selecting good expansion terms for pseudo-relevance feedback
Recommendations
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalPseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Pseudo-Relevance Feedback Based on Locally-Built Co-occurrence Graphs
Advances in Databases and Information SystemsAbstractIn Information Retrieval (IR), user queries are often too short, making the selection of relevant documents hard. Pseudo-relevance feedback (PRF) is an effective method to automatically expand the query with new terms using a set of pseudo-...
Selecting Good Expansion Terms for Improving XML Retrieval Performance
ICCECT '12: Proceedings of the 2012 International Conference on Control Engineering and Communication TechnologyIn this paper, we study how to perform XML query expansion effectively from the high quality pseudo-relevance documents. A solution for selecting good expansion information is presented, in which various features impacting weight, such as term element ...
Comments