ABSTRACT
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.
- J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of SIGIR 2001, pages 10--18, 2001. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR 1998, pages 335--336, 1998. Google ScholarDigital Library
- U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634--652, July 1998. Google ScholarDigital Library
- D. Harman. Overview of the trec 2002 novelty track. In Proceedings of TREC 2002, 2002.Google Scholar
- W. Hersh and P. Over. Trec-8 interactive track report. In E. Voorhees and D. Harman, editors, The Seventh Text REtrieval Conference (TREC-8), pages 57--64, 2000. NIST Special Publication 500--246.Google Scholar
- K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of ACM SIGIR 2000, pages 41--48, 2000. Google ScholarDigital Library
- J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'2001, pages 111--119, Sept 2001. Google ScholarDigital Library
- P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Proceedings of the 2001 Text REtrieval Conference, pages 103--108, 2002.Google Scholar
- P. Over. Trec-6 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-6), pages 73--82, 1998. NIST Special Publication 500--240.Google Scholar
- P. Over. Trec-7 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-7), pages 65--72, 1999. NIST Special Publication 500--242.Google Scholar
- S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, Dec. 1977.Google ScholarCross Ref
- T. Saracevic. Relevance reconsidered. In Proceedings of the 2nd Conference on Conceptions of Library and Information Science, pages 201--218, 1996.Google Scholar
- H. R. Varian. Economics and search (Invited talk at SIGIR 1999). SIGIR Forum, 33(3), 1999. Google ScholarDigital Library
- C. Zhai and J. Lafferty. Model-based feedback in the KL-divergence retrieval model. In Tenth International Conference on Information and Knowledge Management (CIKM 2001), pages 403--410, 2001. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'2001, pages 334--342, Sept 2001. Google ScholarDigital Library
- Y. Zhang, J. Callan, and T. Minka. Redundancy detection in adaptive filtering. In Proceedings of SIGIR'2002, pages 81--88, Aug 2002. Google ScholarDigital Library
Index Terms
- Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Recommendations
Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking ...
Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementIt has been previously observed that optimization of the 1-call@k relevance objective (i.e., a set-based objective that is 1 if at least one document is relevant, otherwise 0) empirically correlates with diverse retrieval. In this paper, we proceed one ...
Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval
Special issue: AIRS2005: Information retrieval research in AsiaIn this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term ...
Comments