ABSTRACT
We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. We demonstrate empirically that our algorithm scores higher in these generalized metrics compared to results produced by commercial search engines.
- Kannan Achan, Ariel Fuxman, Panayiotis Tsaparas, and Rakesh Agrawal. Using the wisdom of the crowds for keyword generation. In WWW, pages 1--8, 2008.Google Scholar
- Aris Anagnostopoulos, Andrei Z. Broder, and David Carmel. Sampling search-engine results. In WWW, pages 245--256, 2005. Google ScholarDigital Library
- A. Bookstein. Information retrieval: A sequential learning process. Journal of the American Society for Information Sciences (ASIS), 34(5):331--342, 1983.Google Scholar
- B. Boyce. Beyond topicality: A two stage view of relevance and the retrieval process. Info. Processing and Management, 18(3):105--109, 1982.Google ScholarCross Ref
- Jaime G. Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
- Harr Chen and David R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
- Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
- W. Goffman. A searching procedure for information retrieval. Info. Storage and Retrieval, 2:73--78, 1964.Google ScholarCross Ref
- Dorit Hochbaum, editor. Approximation Algorithms for NP-Hard problems. Springer, 1999. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pages 41--48, 2000. Google ScholarDigital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge Univ Press, 2008. Google ScholarDigital Library
- G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Programming, 14:265--294, 1978.Google ScholarDigital Library
- F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, 2008. First presented at NIPS07 Workshop on Machine Learning for Web Search. Google ScholarDigital Library
- Filip Radlinski and Susan T. Dumais. Improving personalized web search using result diversification. In SIGIR, pages 691--692, 2006. Google ScholarDigital Library
- Erik Vee, Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, and Sihem Amer-Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008. Google ScholarDigital Library
- E. M. Voorhees. Overview of the trec 2004 robust retrieval track. In TREC, 2004.Google Scholar
- Yunjie Xu and Hainan Yin. Novelty and topicality in interactive information retrieval. J. Am. Soc. Inf. Sci. Technol., 59(2):201--215, 2008. Google ScholarDigital Library
- ChengXiang Zhai. Risk Minimization and Language Modeling in Information Retrieval. PhD thesis, Carnegie Mellon University, 2002.Google Scholar
- ChengXiang Zhai, William W. Cohen, and John D. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library
- ChengXiang Zhai and John D. Lafferty. A risk minimzation framework for information retrieval. Info. Processing and Management, 42(1):31--55, 2006. Google ScholarDigital Library
- Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In WWW, pages 22--32, 2005. Google ScholarDigital Library
Index Terms
- Diversifying search results
Recommendations
Intent-based diversification of web search results: metrics and algorithms
We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, ...
Diversifying product search results
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalIn recent years, online shopping is becoming more and more popular. Users type keyword queries on product search systems to find relevant products, accessories, and even related products. However, existing product search systems always return very ...
Post-ranking query suggestion by diversifying search results
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalQuery suggestion refers to the process of suggesting related queries to search engine users. Most existing researches have focused on improving the relevance of suggested queries. In this paper, we introduce the concept of diversifying the content of ...
Comments