skip to main content
10.1145/860435.860440acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Published:28 July 2003Publication History

ABSTRACT

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.

References

  1. J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of SIGIR 2001, pages 10--18, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR 1998, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634--652, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Harman. Overview of the trec 2002 novelty track. In Proceedings of TREC 2002, 2002.Google ScholarGoogle Scholar
  5. W. Hersh and P. Over. Trec-8 interactive track report. In E. Voorhees and D. Harman, editors, The Seventh Text REtrieval Conference (TREC-8), pages 57--64, 2000. NIST Special Publication 500--246.Google ScholarGoogle Scholar
  6. K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of ACM SIGIR 2000, pages 41--48, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'2001, pages 111--119, Sept 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Proceedings of the 2001 Text REtrieval Conference, pages 103--108, 2002.Google ScholarGoogle Scholar
  9. P. Over. Trec-6 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-6), pages 73--82, 1998. NIST Special Publication 500--240.Google ScholarGoogle Scholar
  10. P. Over. Trec-7 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-7), pages 65--72, 1999. NIST Special Publication 500--242.Google ScholarGoogle Scholar
  11. S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, Dec. 1977.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Saracevic. Relevance reconsidered. In Proceedings of the 2nd Conference on Conceptions of Library and Information Science, pages 201--218, 1996.Google ScholarGoogle Scholar
  13. H. R. Varian. Economics and search (Invited talk at SIGIR 1999). SIGIR Forum, 33(3), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Zhai and J. Lafferty. Model-based feedback in the KL-divergence retrieval model. In Tenth International Conference on Information and Knowledge Management (CIKM 2001), pages 403--410, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'2001, pages 334--342, Sept 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Zhang, J. Callan, and T. Minka. Redundancy detection in adaptive filtering. In Proceedings of SIGIR'2002, pages 81--88, Aug 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
      July 2003
      490 pages
      ISBN:1581136463
      DOI:10.1145/860435

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 July 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SIGIR '03 Paper Acceptance Rate46of266submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader