ABSTRACT
Pseudo-relevance feedback (PRF) improves search quality by expanding the query using terms from high-ranking documents from an initial retrieval. Although PRF can often result in large gains in effectiveness, running two queries is time consuming, limiting its applicability. We describe a PRF method that uses corpus pre-processing to achieve query-time speeds that are near those of the original queries. Specifically, Relevance Modeling, a language modeling based PRF method, can be recast to benefit substantially from finding pairwise document relationships in advance. Using the resulting Fast Relevance Model (fastRM), we substantially reduce the online retrieval time and still benefit from expansion. We further explore methods for reducing the preprocessing time and storage requirements of the approach, allowing us to achieve up to a 10% increase in MAP over unexpanded retrieval,vwhile only requiring 1% of the time of standard expansion.
- N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 --- novelty and HARD. In Proceedings of TREC, Gaithersburg, MD, USA, 2004. NIST.Google ScholarCross Ref
- V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In Proceedings of SIGIR, pages 372--379, 2006. Google ScholarDigital Library
- T. Elsayed, J. Lin, and D. W. Oard. Pairwise document similarity in large collections with mapreduce. In Proceedings of ACL/HLT, pages 265--268, Morristown, NJ, USA, 2008. Association for Computational Linguistics. Google ScholarDigital Library
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002. Google ScholarDigital Library
- V. Lavrenko and J. Allan. Real-time query expansion in relevance models. IR 473, University of Massachusetts Amherst, 2006.Google Scholar
- V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of SIGIR, pages 120--127, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- J. Lin. Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce. In Proceedings of SIGIR, pages 155--162, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of CIKM, pages 623--632, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- T. Strohman. Efficient Processing of Complex Features for Information Retrieval. PhD thesis, University of Massachusetts Amherst, December 2007. Google ScholarDigital Library
Index Terms
- Fast query expansion using approximations of relevance models
Recommendations
A deterministic resampling method using overlapping document clusters for pseudo-relevance feedback
Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-...
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalPseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous InformationSuccessful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
Comments