skip to main content
10.1145/1772690.1772770acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Diversifying web search results

Published:26 April 2010Publication History

ABSTRACT

Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how 'diversity' interplays with 'quality' and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14% to 38% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Leong. Diversifying search results. In Proc. of ACM Conf. on Web Search and Data Mining, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bhatia. Positive Definite Matrices. Princeton University Press, 2006.Google ScholarGoogle Scholar
  4. T. Brants and A. Franz. Web 1t 5-gram version 1. Linguistic Data Consortium, Philadelphia, 2006.Google ScholarGoogle Scholar
  5. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR Posters, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Chen and D. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In Proc. of SIGIR Conf., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Clarke, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. of SIGIR Conf., pages 659--666, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Fletcher. Practical methods of optimization. Wiley and Sons, second edition, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Gertz and S. Wright. Object-oriented software for quadratic programming (ooqp). http://pages.cs.wisc/edu/ swright/ooqp.Google ScholarGoogle Scholar
  10. H. Craswell, C. Clarke, I. Soboroff. TREC 2009 novelty track. In Proc. of TREC, 2009.Google ScholarGoogle Scholar
  11. H. Markowitz. Portfolio selection. The Journal of Finance, VII(1):77--91, 1952.Google ScholarGoogle Scholar
  12. J. Nocedal and S. Wright. Numerical optimization. Springer, second edition, 2006.Google ScholarGoogle Scholar
  13. G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In The 1st Intl. Conf. on Scalable Information Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. of SIGIR Conf. (Poster Session), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proc. of WWW Conf., pages 521--529, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Teevan, E. Adar, R. Jones, and M. Potts. Information re-retrieval: repeat queries in yahoos logs. In Proc. of SIGIR Conf., pages 151--158, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. A. Yahia. Efficient computation of diverse query results. In Proc. of the ICDE Conf., pages 228--236, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. of SIGIR Conf., pages 115--122, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wikipedia. http://en.wikipedia.org.Google ScholarGoogle Scholar
  20. C. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proc. of SIGIR Conf., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Zhai and J. Lafferty. A risk minimization framework for information retrieval. In Proc. of SIGIR Workshop on Mathematical/Formal Methods in IR, 2003.Google ScholarGoogle Scholar
  22. B. Zhang, H. Li, Y. Liu, L. Ji,W. Xi, W. Fan, Z. Chen, and W. Ma. Improving web search results using affinity graph. In Proc. of SIGIR Conf., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Zwol, V. Murdock, L. Pueyo, and G. Ramirez. Diversifying image search with user generated content. In Proc. of the 1st ACM Conf. on Multimedia IR, pages 67--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Diversifying web search results

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub