skip to main content
10.1145/2484028.2484073acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A mutual information-based framework for the analysis of information retrieval systems

Published:28 July 2013Publication History

ABSTRACT

We consider the problem of information retrieval evaluation and the methods and metrics used for such evaluations. We propose a probabilistic framework for evaluation which we use to develop new information-theoretic evaluation metrics. We demonstrate that these new metrics are powerful and generalizable, enabling evaluations heretofore not possible.

We introduce four preliminary uses of our framework: (1) a measure of conditional rank correlation, information tau, a powerful meta-evaluation tool whose use we demonstrate on understanding novelty and diversity evaluation; (2) a new evaluation measure, relevance information correlation, which is correlated with traditional evaluation measures and can be used to (3) evaluate a collection of systems simultaneously, which provides a natural upper bound on metasearch performance; and (4) a measure of the similarity between rankers on judged documents, information difference, which allows us to determine whether systems with similar performance are in fact different.

References

  1. Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM '09, pages 5-14, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chris Buckley and Ellen M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christopher J.C. Burges. From ranknet to lambdarank to lambdamart: An overview. Technical Report MSR-TR-2010-82, Microsoft Research, 2010.Google ScholarGoogle Scholar
  4. Ben Carterette and Paul N. Bennett. Evaluation measures for preference judgments. In SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ben Carterette, Paul N. Bennett, David Maxwell Chickering, and Susan T. Dumais. Here or there: preference judgments for relevance. In Proceedings of the IR research, 30th European conference on Advances in information retrieval, ECIR'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Praveen Chandar and Ben Carterette. Using preference judgments for novel document retrieval. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 621-630, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Charles L. A. Clarke, Nick Craswell, Ian Soboroff, and Gordon V. Cormack. Overview of the TREC 2010 Web Track. In 19th Text REtrieval Conference, Gaithersburg, Maryland, 2010.Google ScholarGoogle Scholar
  9. Charles L. A. Clarke, Nick Craswell, Ian Soboroff, and Ellen M. Voorhees. Overview of the TREC 2011 Web Track. In 20th Text REtrieval Conference, Gaithersburg, Maryland, 2011.Google ScholarGoogle Scholar
  10. Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Buttcher, and Ian MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '08, pages 659-666, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley-Interscience, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM '08, pages 87-94, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4, December 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Peter B. Golbus, Javed A. Aslam, and Charles L.A. Clarke. Increasing evaluation sensitivity to diversity. In Journal of Information Retrieval, To Appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kalervo Jarvelin and Jaana Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422-446, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. G. Kendall. A New Measure of Rank Correlation. Biometrika, 30(1/2):81-93, June 1938.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alistair Moffat and Justin Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):2:1-2:27, December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mark Montague. Metasearch: Data Fusion for Document Retrieval. PhD thesis, Dartmouth College. Dept. of Computer Science, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Montague and Javed A. Aslam. Condorcet fusion for improved retrieval. In Proceedings of the eleventh international conference on Information and knowledge management, CIKM '02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stephen E. Robertson, Evangelos Kanoulas, and Emine Yilmaz. Extending average precision to graded relevance judgments. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tetsuya Sakai. Evaluating evaluation metrics based on the bootstrap. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tetsuya Sakai. Alternatives to Bpref. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tetsuya Sakai and Ruihua Song. Evaluating diversified search results using per-intent graded relevance. In SIGIR, pages 1043-1052, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Joseph A. Shaw and Edward A. Fox. Combination of multiple searches. In The Second Text REtrieval Conference (TREC-2), pages 243-252, 1994.Google ScholarGoogle Scholar
  26. Ruihua Song, Min Zhang, Tetsuya Sakai, Makoto P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang, and Naoki Orii. Overview of the ntcir-9 intent task. In Proceedings of the 9th NTCIR Workshop, Tokyo, Japan, 2011.Google ScholarGoogle Scholar
  27. C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 1904.Google ScholarGoogle ScholarCross RefCross Ref
  28. E. M. Voorhees and D. Harman. Overview of the eighth text retrieval conference (TREC-8). In Proceedings of the Eighth Text REtrieval Conference (TREC-8), 2000.Google ScholarGoogle ScholarCross RefCross Ref
  29. E. M. Voorhees and D. Harman. Overview of the ninth text retrieval conference (TREC-9). In Proceedings of the Ninth Text REtrieval Conference (TREC-9), 2001.Google ScholarGoogle ScholarCross RefCross Ref
  30. Emine Yilmaz and Javed A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM '06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. A new rank correlation coefficient for information retrieval. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Emine Yilmaz, Evangelos Kanoulas, and Javed A. Aslam. A simple and efficient sampling method for estimating AP and nDCG. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179-214, April 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A mutual information-based framework for the analysis of information retrieval systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
      July 2013
      1188 pages
      ISBN:9781450320344
      DOI:10.1145/2484028

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 July 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '13 Paper Acceptance Rate73of366submissions,20%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader