research-article

A user browsing model to predict search engine click data from past observations.

Authors:
Georges E. Dupret

Yahoo! Research Latin America, Santiago, Chile

Yahoo! Research Latin America, Santiago, Chile
View Profile

,
Benjamin Piwowarski

Yahoo! Research Latin America, Santiago, Chile

Yahoo! Research Latin America, Santiago, Chile
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 331–338https://doi.org/10.1145/1390334.1390392

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 331–338

ABSTRACT

Search engine click logs provide an invaluable source of relevance information but this information is biased because we ignore which documents from the result list the users have actually seen before and after they clicked. Otherwise, we could estimate document relevance by simple counting. In this paper, we propose a set of assumptions on user browsing behavior that allows the estimation of the probability that a document is seen, thereby providing an unbiased estimate of document relevance. To train, test and compare our model to the best alternatives described in the Literature, we gather a large set of real data and proceed to an extensive cross-validation experiment. Our solution outperforms very significantly all previous models. As a side effect, we gain insight into the browsing behavior of users and we can compare it to the conclusions of an eye-tracking experiments by Joachims et al. [12]. In particular, our findings confirm that a user almost always see the document directly after a clicked document. They also explain why documents situated just after a very relevant document are clicked more often.

References

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of ACM SIGIR 2006, pages 19--26, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of ACM SIGIR 2006, pages 3--10, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
H. Becker, C. Meek, and D. M. Chickering. Modeling contextual factors of click rates. In AAAI, pages 1310--1315, 2007. Google ScholarDigital Library
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In First ACM International Conference on Web Search and Data Mining WSDM 2008, 2008. Google ScholarDigital Library
D. Downey, S. T. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In IJCAI, pages 2740--2747, 2007. Google ScholarDigital Library
G. Dupret, B. Piwowarski, C. Hurtado, and M. Mendoza. A statistical model of query log generation. In Proceedings of SPIRE 2006, LNCS 4209, pages 217--228. Springer, 2006. Google ScholarDigital Library
A. Genkin, D. Lewis, and D. Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 2007.Google Scholar
L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www search. In Proceedings of ACM SIGIR 2004, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD, pages 133--142, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of ACM SIGIR 2005, pages 154--161, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2), 2007. Google ScholarDigital Library
R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In WWW '07, pages 21--30, New York, NY, USA, 2007. ACM. Google ScholarDigital Library

Index Terms

A user browsing model to predict search engine click data from past observations.
1. Information systems
  1. Information systems applications

Recommendations

A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

We propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he ...
Read More
A user behavior model for average precision and its generalization to graded judgments
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

We explore a set of hypothesis on user behavior that are potentially at the origin of the (Mean) Average Precision (AP) metric. This allows us to propose a more realistic version of AP where users click non-deterministically on relevant documents and ...
Read More
Characterizing search intent diversity into click models
WWW '11: Proceedings of the 20th international conference on World wide web

Modeling a user's click-through behavior in click logs is a challenging task due to the well-known position bias problem. Recent advances in click models have adopted the examination hypothesis which distinguishes document relevance from position bias. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clickthrough data
search engines
statistical model
user behavior
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 302
  Total Citations
  View Citations
- 2,851
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A user browsing model to predict search engine click data from past observations.

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

A user behavior model for average precision and its generalization to graded judgments

Characterizing search intent diversity into click models