ABSTRACT
A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a system is called an aggregated or federated search system.
Because search engines evolve over time, their results need to be constantly evaluated. However, one of the most efficient and widely used evaluation methods, interleaving, cannot be directly applied to aggregated search systems, as it ignores the need to group results originating from the same source (vertical results).
We propose an interleaving algorithm that allows comparisons of search engine result pages containing grouped vertical documents. We compare our algorithm to existing interleaving algorithms and other evaluation methods (such as A/B-testing), both on real-life click log data and in simulation experiments. We find that our algorithm allows us to perform unbiased and accurate interleaved comparisons that are comparable to conventional evaluation techniques. We also show that our interleaving algorithm produces a ranking that does not substantially alter the user experience, while being sensitive to changes in both the vertical result block and the non-vertical document rankings. All this makes our proposed interleaving algorithm an essential tool for comparing IR systems with complex aggregated pages.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM. ACM, 2009. Google ScholarDigital Library
- J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. In ECIR. Springer, 2011. Google ScholarDigital Library
- O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM. ACM, 2009. Google ScholarDigital Library
- O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large-scale validation and analysis of interleaved search evaluation. ACM TOIS, 2012. Google ScholarDigital Library
- D. Chen, W. Chen, and H. Wang. Beyond ten blue links: enabling user click modeling in federated web search. In WSDM. ACM, 2012. Google ScholarDigital Library
- A. Chuklin, P. Serdyukov, and M. de Rijke. Using intent information to model user behavior in diversified search. In ECIR, 2013. Google ScholarDigital Library
- C. Clarke, M. Kolla, and G. Cormack. Novelty and diversity in information retrieval evaluation. In SIGIR. ACM, 2008. Google ScholarDigital Library
- C. W. Cleverdon, J. Mills, and M. Keen. Factors determining the performance of indexing systems. Techn. report, ASLIB Cranfield project, 1966.Google Scholar
- S. Dumais, E. Cutrell, and H. Chen. Optimizing search by showing results in context. In CHI, 2001. Google ScholarDigital Library
- J. He, C. Zhai, and X. Li. Evaluation of methods for relative comparison of retrieval systems based on click throughs. In CIKM. ACM, 2009. Google ScholarDigital Library
- K. Hofmann, S. Whiteson, and M. de Rijke. A probabilistic method for inferring preferences from clicks. In CIKM. ACM, 2011. Google ScholarDigital Library
- K. Hofmann, S. Whiteson, and M. Rijke. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, 16(1), Apr. 2012. Google ScholarDigital Library
- K. Hofmann, S. Whiteson, and M. de Rijke. Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Trans. Inf. Syst., 31(4), Oct. 2013. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In KDD. ACM, 2002. Google ScholarDigital Library
- T. Joachims. Evaluating retrieval performance using clickthrough data. Text Mining, 2003.Google Scholar
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR. ACM, 2005. Google ScholarDigital Library
- A. K. Ponnuswami, K. Pattabiraman, Q. Wu, R. Gilad-Bachrach, and T. Kanungo. On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals. In WSDM. ACM, 2011. Google ScholarDigital Library
- F. Radlinski and N. Craswell. Optimized interleaving for online retrieval evaluation. In WSDM, 2013. Google ScholarDigital Library
- F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM. ACM, 2008. Google ScholarDigital Library
- A. Schuth, K. Hofmann, S. Whiteson, and M. de Rijke. Lerot: an Online Learning to Rank Framework. In Living Labs workshop at CIKM. ACM, 2013. Google ScholarDigital Library
- J. Seo, W. B. Croft, K. H. Kim, and J. H. Lee. Smoothing click counts for aggregated vertical search. Advances in Information Retrieval, 2011. Google ScholarDigital Library
- K. Zhou, R. Cummins, M. Lalmas, and J. M. Jose. Evaluating Aggregated Search Pages. In SIGIR, 2012. Google ScholarDigital Library
Index Terms
- Evaluating aggregated search using interleaving
Recommendations
A Comparative Analysis of Interleaving Methods for Aggregated Search
A result page of a modern search engine often goes beyond a simple list of “10 blue links.” Many specific user needs (e.g., News, Image, Video) are addressed by so-called aggregated or vertical search solutions: specially presented documents, often ...
Aggregated Search and Interleaving Methods: A survey
BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless TechnologiesAggregated search attempts to satisfy user's need by searching and assembling information from variety verticals and placing them into a single result page. Aggregated search has two research directions namely, cross-vertical Aggregated Search (cvAS) ...
Interest and Evaluation of Aggregated Search
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Major search engines perform what is known as Aggregated Search (AS). They integrate results coming from different vertical search engines (images, videos, news, etc.) with typical Web search results. Aggregated search is relatively new and its ...
Comments