ABSTRACT
Search engine click logs provide an invaluable source of relevance information but this information is biased because we ignore which documents from the result list the users have actually seen before and after they clicked. Otherwise, we could estimate document relevance by simple counting. In this paper, we propose a set of assumptions on user browsing behavior that allows the estimation of the probability that a document is seen, thereby providing an unbiased estimate of document relevance. To train, test and compare our model to the best alternatives described in the Literature, we gather a large set of real data and proceed to an extensive cross-validation experiment. Our solution outperforms very significantly all previous models. As a side effect, we gain insight into the browsing behavior of users and we can compare it to the conclusions of an eye-tracking experiments by Joachims et al. [12]. In particular, our findings confirm that a user almost always see the document directly after a clicked document. They also explain why documents situated just after a very relevant document are clicked more often.
- E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of ACM SIGIR 2006, pages 19--26, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of ACM SIGIR 2006, pages 3--10, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- H. Becker, C. Meek, and D. M. Chickering. Modeling contextual factors of click rates. In AAAI, pages 1310--1315, 2007. Google ScholarDigital Library
- A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
- N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In First ACM International Conference on Web Search and Data Mining WSDM 2008, 2008. Google ScholarDigital Library
- D. Downey, S. T. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In IJCAI, pages 2740--2747, 2007. Google ScholarDigital Library
- G. Dupret, B. Piwowarski, C. Hurtado, and M. Mendoza. A statistical model of query log generation. In Proceedings of SPIRE 2006, LNCS 4209, pages 217--228. Springer, 2006. Google ScholarDigital Library
- A. Genkin, D. Lewis, and D. Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 2007.Google Scholar
- L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www search. In Proceedings of ACM SIGIR 2004, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD, pages 133--142, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of ACM SIGIR 2005, pages 154--161, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2), 2007. Google ScholarDigital Library
- R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In WWW '07, pages 21--30, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Index Terms
- A user browsing model to predict search engine click data from past observations.
Recommendations
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
WSDM '10: Proceedings of the third ACM international conference on Web search and data miningWe propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he ...
A user behavior model for average precision and its generalization to graded judgments
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalWe explore a set of hypothesis on user behavior that are potentially at the origin of the (Mean) Average Precision (AP) metric. This allows us to propose a more realistic version of AP where users click non-deterministically on relevant documents and ...
Characterizing search intent diversity into click models
WWW '11: Proceedings of the 20th international conference on World wide webModeling a user's click-through behavior in click logs is a challenging task due to the well-known position bias problem. Recent advances in click models have adopted the examination hypothesis which distinguishes document relevance from position bias. ...
Comments