Article

Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Authors:
Cheng Xiang Zhai

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
William W. Cohen

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
John Lafferty

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalJuly 2003Pages 10–17https://doi.org/10.1145/860435.860440

Published:28 July 2003Publication History

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Pages 10–17

ABSTRACT

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.

References

J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of SIGIR 2001, pages 10--18, 2001. Google ScholarDigital Library
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR 1998, pages 335--336, 1998. Google ScholarDigital Library
U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634--652, July 1998. Google ScholarDigital Library
D. Harman. Overview of the trec 2002 novelty track. In Proceedings of TREC 2002, 2002.Google Scholar
W. Hersh and P. Over. Trec-8 interactive track report. In E. Voorhees and D. Harman, editors, The Seventh Text REtrieval Conference (TREC-8), pages 57--64, 2000. NIST Special Publication 500--246.Google Scholar
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of ACM SIGIR 2000, pages 41--48, 2000. Google ScholarDigital Library
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'2001, pages 111--119, Sept 2001. Google ScholarDigital Library
P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Proceedings of the 2001 Text REtrieval Conference, pages 103--108, 2002.Google Scholar
P. Over. Trec-6 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-6), pages 73--82, 1998. NIST Special Publication 500--240.Google Scholar
P. Over. Trec-7 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-7), pages 65--72, 1999. NIST Special Publication 500--242.Google Scholar
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, Dec. 1977.Google ScholarCross Ref
T. Saracevic. Relevance reconsidered. In Proceedings of the 2nd Conference on Conceptions of Library and Information Science, pages 201--218, 1996.Google Scholar
H. R. Varian. Economics and search (Invited talk at SIGIR 1999). SIGIR Forum, 33(3), 1999. Google ScholarDigital Library
C. Zhai and J. Lafferty. Model-based feedback in the KL-divergence retrieval model. In Tenth International Conference on Information and Knowledge Management (CIKM 2001), pages 403--410, 2001. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'2001, pages 334--342, Sept 2001. Google ScholarDigital Library
Y. Zhang, J. Callan, and T. Minka. Redundancy detection in adaptive filtering. In Proceedings of SIGIR'2002, pages 81--88, Aug 2002. Google ScholarDigital Library

Index Terms

Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
1. Information systems
  1. Information retrieval

Recommendations

Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking ...
Read More
Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

It has been previously observed that optimization of the 1-call@k relevance objective (i.e., a set-based objective that is 1 if at least one document is relevant, otherwise 0) empirically correlates with diverse retrieval. In this paper, we proceed one ...
Read More
Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval
Special issue: AIRS2005: Information retrieval research in Asia

In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
July 2003
490 pages
ISBN:1581136463
DOI:10.1145/860435
General Chairs:
Charles Clarke
University of Waterloo, Canada
,
Gordon Cormack
University of Waterloo, Canada
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, Pittsburgh, PA
,
David Hawking
Australian National University, Australia
,
Alan Smeaton
Dublin City University, Ireland
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 July 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
language models
maximal marginal relevance
subtopic retrieval
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '03 Paper Acceptance Rate46of266submissions,17%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 392
  Total Citations
  View Citations
- 2,238
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval

Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model

Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval