ABSTRACT
Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how 'diversity' interplays with 'quality' and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14% to 38% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Leong. Diversifying search results. In Proc. of ACM Conf. on Web Search and Data Mining, 2009. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
- R. Bhatia. Positive Definite Matrices. Princeton University Press, 2006.Google Scholar
- T. Brants and A. Franz. Web 1t 5-gram version 1. Linguistic Data Consortium, Philadelphia, 2006.Google Scholar
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR Posters, 1998. Google ScholarDigital Library
- H. Chen and D. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In Proc. of SIGIR Conf., 2006. Google ScholarDigital Library
- C. Clarke, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. of SIGIR Conf., pages 659--666, 2008. Google ScholarDigital Library
- R. Fletcher. Practical methods of optimization. Wiley and Sons, second edition, 1987. Google ScholarDigital Library
- M. Gertz and S. Wright. Object-oriented software for quadratic programming (ooqp). http://pages.cs.wisc/edu/ swright/ooqp.Google Scholar
- H. Craswell, C. Clarke, I. Soboroff. TREC 2009 novelty track. In Proc. of TREC, 2009.Google Scholar
- H. Markowitz. Portfolio selection. The Journal of Finance, VII(1):77--91, 1952.Google Scholar
- J. Nocedal and S. Wright. Numerical optimization. Springer, second edition, 2006.Google Scholar
- G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In The 1st Intl. Conf. on Scalable Information Systems, 2006. Google ScholarDigital Library
- F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. of SIGIR Conf. (Poster Session), 2006. Google ScholarDigital Library
- M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proc. of WWW Conf., pages 521--529, 2007. Google ScholarDigital Library
- J. Teevan, E. Adar, R. Jones, and M. Potts. Information re-retrieval: repeat queries in yahoos logs. In Proc. of SIGIR Conf., pages 151--158, 2007. Google ScholarDigital Library
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. A. Yahia. Efficient computation of diverse query results. In Proc. of the ICDE Conf., pages 228--236, 2008. Google ScholarDigital Library
- J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. of SIGIR Conf., pages 115--122, 2009. Google ScholarDigital Library
- Wikipedia. http://en.wikipedia.org.Google Scholar
- C. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proc. of SIGIR Conf., 2003. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A risk minimization framework for information retrieval. In Proc. of SIGIR Workshop on Mathematical/Formal Methods in IR, 2003.Google Scholar
- B. Zhang, H. Li, Y. Liu, L. Ji,W. Xi, W. Fan, Z. Chen, and W. Ma. Improving web search results using affinity graph. In Proc. of SIGIR Conf., 2005. Google ScholarDigital Library
- R. Zwol, V. Murdock, L. Pueyo, and G. Ramirez. Diversifying image search with user generated content. In Proc. of the 1st ACM Conf. on Multimedia IR, pages 67--74, 2008. Google ScholarDigital Library
Index Terms
- Diversifying web search results
Recommendations
Post-ranking query suggestion by diversifying search results
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalQuery suggestion refers to the process of suggesting related queries to search engine users. Most existing researches have focused on improving the relevance of suggested queries. In this paper, we introduce the concept of diversifying the content of ...
Search result diversity for informational queries
WWW '11: Proceedings of the 20th international conference on World wide webAmbiguous queries constitute a significant fraction of search instances and pose real challenges to web search engines. With current approaches the top results for these queries tend to be homogeneous, making it difficult for users interested in less ...
Improving personalized web search using result diversification
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalWe present and evaluate methods for diversifying search results to improve personalized web search. A common personalization approach involves reranking the top N search results such that documents likely to be preferred by the user are presented ...
Comments