skip to main content
10.1145/1498759.1498766acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Diversifying search results

Published:09 February 2009Publication History

ABSTRACT

We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. We demonstrate empirically that our algorithm scores higher in these generalized metrics compared to results produced by commercial search engines.

References

  1. Kannan Achan, Ariel Fuxman, Panayiotis Tsaparas, and Rakesh Agrawal. Using the wisdom of the crowds for keyword generation. In WWW, pages 1--8, 2008.Google ScholarGoogle Scholar
  2. Aris Anagnostopoulos, Andrei Z. Broder, and David Carmel. Sampling search-engine results. In WWW, pages 245--256, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bookstein. Information retrieval: A sequential learning process. Journal of the American Society for Information Sciences (ASIS), 34(5):331--342, 1983.Google ScholarGoogle Scholar
  4. B. Boyce. Beyond topicality: A two stage view of relevance and the retrieval process. Info. Processing and Management, 18(3):105--109, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jaime G. Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Harr Chen and David R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Goffman. A searching procedure for information retrieval. Info. Storage and Retrieval, 2:73--78, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  9. Dorit Hochbaum, editor. Approximation Algorithms for NP-Hard problems. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pages 41--48, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge Univ Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Programming, 14:265--294, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, 2008. First presented at NIPS07 Workshop on Machine Learning for Web Search. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Filip Radlinski and Susan T. Dumais. Improving personalized web search using result diversification. In SIGIR, pages 691--692, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Erik Vee, Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, and Sihem Amer-Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. M. Voorhees. Overview of the trec 2004 robust retrieval track. In TREC, 2004.Google ScholarGoogle Scholar
  17. Yunjie Xu and Hainan Yin. Novelty and topicality in interactive information retrieval. J. Am. Soc. Inf. Sci. Technol., 59(2):201--215, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ChengXiang Zhai. Risk Minimization and Language Modeling in Information Retrieval. PhD thesis, Carnegie Mellon University, 2002.Google ScholarGoogle Scholar
  19. ChengXiang Zhai, William W. Cohen, and John D. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ChengXiang Zhai and John D. Lafferty. A risk minimzation framework for information retrieval. Info. Processing and Management, 42(1):31--55, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In WWW, pages 22--32, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Diversifying search results

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
      February 2009
      314 pages
      ISBN:9781605583907
      DOI:10.1145/1498759

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 February 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader