Estimating Position Bias without Intrusive Interventions

Authors:
Aman Agarwal

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Ivan Zaitsev

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Xuanhui Wang

Google Inc., Mountain View, CA, USA

Google Inc., Mountain View, CA, USA
View Profile

,
Cheng Li

Google Inc., Mountain View, CA, USA

Google Inc., Mountain View, CA, USA
View Profile

,
Marc Najork

Google Inc., Mountain View, CA, USA

Google Inc., Mountain View, CA, USA
View Profile

,
Thorsten Joachims

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningJanuary 2019Pages 474–482https://doi.org/10.1145/3289600.3291017

Published:30 January 2019Publication History

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pages 474–482

ABSTRACT

Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \citeJoachims/etal/17a can provably overcome presentation bias when observation propensities are known, it remains to show how to effectively estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. First, we show how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and show that this data is sufficient for consistent propensity estimation in the position-based model. Second, we propose a new extremum estimator that makes effective use of this data. In an empirical evaluation, we find that the new estimator provides superior propensity estimates in two real-world systems -- Arxiv Full-text Search and Google Drive Search. Beyond these two points, we find that the method is robust to a wide range of settings in simulation studies.

References

Aman Agarwal, Soumya Basu, Tobias Schnabel, and Thorsten Joachims. 2017. Effective Evaluation using Logged Bandit Feedback from Multiple Loggers. In Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 687--696. Google ScholarDigital Library
Aman Agarwal, Ivan Zaitsev, and Thorsten Joachims. 2018. Counterfactual Learning-to-Rank for Additive Metrics and Deep Models. In ICML Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML).Google Scholar
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proc. of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR). 385--394. Google ScholarDigital Library
Ben Carterette and Praveen Chandar. 2018. Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback. In Proc. of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR). 705--714. Google ScholarDigital Library
Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-Scale Validation and Analysis of Interleaved Search Evaluation. ACM Transactions on Information Systems (TOIS), Vol. 30, 1 (2012), 6:1--6:41. Google ScholarDigital Library
Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In Proc. of the 18th International Conference on World Wide Web (WWW). 1--10. Google ScholarDigital Library
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search .Morgan & Claypool Publishers.Google Scholar
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-bias Models. In Proc. of the 1st International Conference on Web Search and Web Data Mining (WSDM). 87--94. Google ScholarDigital Library
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), Vol. 39 (1977), 1--38. Issue 1.Google ScholarCross Ref
Miroslav Dud'i k, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proc. of the 28th International Conference on Machine Learning (ICML). 1097--1104. Google ScholarDigital Library
Georges E. Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. In Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 331--338. Google ScholarDigital Library
Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael Taylor, Yi-Min Wang, and Christos Faloutsos. 2009. Click Chain Model in Web Search. In Proc. of the 18th International Conference on World Wide Web (WWW). 11--20. Google ScholarDigital Library
G. Imbens and D. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences .Cambridge University Press. Google ScholarDigital Library
Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 133--142. Google ScholarDigital Library
Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately Interpreting Clickthrough Data as Implicit Feedback. In Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 154--161. Google ScholarDigital Library
Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. 2007. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search. ACM Transactions on Information Systems (TOIS), Vol. 25, 2, Article 7 (April 2007). Google ScholarDigital Library
Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. 2018. Deep Learning with Logged Bandit Feedback. In 6th International Conference on Learning Representations (ICLR).Google Scholar
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the 10th ACM International Conference on Web Search and Data Mining, (WSDM). 781--789. Google ScholarDigital Library
John Langford, Alexander Strehl, and Jennifer Wortman. 2008. Exploration Scavenging. In Proc. of the 25th International Conference on Machine Learning (ICML). 528--535. Google ScholarDigital Library
Lihong Li, Shunbao Chen, Jim Kleban, and Ankur Gupta. 2014. Counterfactual Estimation and Optimization of Click Metrics for Search Engines. CoRR, Vol. abs/1403.1891 (2014).Google Scholar
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms. In Proc. of the 4th International Conference on Web Search and Web Data Mining (WSDM). 297--306. Google ScholarDigital Library
Maeve O'Brien and Mark T Keane. 2006. Modeling result--list searching in the World Wide Web: The role of relevance topologies and trust bias. In Proc. of the 28th Annual Conference of the Cognitive Science Society (CogSci). 1--881.Google Scholar
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting Clicks: Estimating the Click-through Rate for New Ads. In Proc. of the 16th International Conference on World Wide Web (WWW). 521--530. Google ScholarDigital Library
Paul R. Rosenbaum and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, Vol. 70, 1 (1983), 41--55.Google ScholarCross Ref
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. In Proc. of the 33rd International Conference on Machine Learning (ICML). 1670--1679. Google ScholarDigital Library
Adith Swaminathan and Thorsten Joachims. 2015a. Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization. Journal of Machine Learning Research (JMLR), Vol. 16 (Sep 2015), 1731--1755. Google ScholarDigital Library
Adith Swaminathan and Thorsten Joachims. 2015b. The Self-Normalized Estimator for Counterfactual Learning. In Proc. of the 28th International Conference on Neural Information Processing Systems (NIPS). 3231--3239. Google ScholarDigital Library
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 115--124. Google ScholarDigital Library
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM). 610--618. Google ScholarDigital Library
Yisong Yue, Rajan Patel, and Hein Roehrig. 2010. Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data. In Proc. of the 19th International Conference on World Wide Web (WWW). 1011--1018. Google ScholarDigital Library

Index Terms

Estimating Position Bias without Intrusive Interventions
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Causal reasoning and diagnostics
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Learning to rank
    2. Learning settings
      1. Batch learning
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval models and ranking
      1. Learning to rank
  2. World Wide Web
    1. Web searching and information discovery
      1. Web search engines
        Page and site ranking

Recommendations

Position Bias Estimation for Unbiased Learning to Rank in Personal Search
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

A well-known challenge in learning from click data is its inherent bias and most notably position bias. Traditional click models aim to extract the ‹query, document› relevance and the estimated bias is usually discarded after relevance is extracted. In ...
Read More
An experimental comparison of click position-bias models
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

Search engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document's position in the results page. This paper ...
Read More
Addressing Trust Bias for Unbiased Learning-to-Rank
WWW '19: The World Wide Web Conference

Existing unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the click incompleteness bias, but usually assume that the clicks are noise-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
January 2019
874 pages
ISBN:9781450359405
DOI:10.1145/3289600
General Chairs:
J. Shane Culpepper
RMIT University
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Paul N. Bennett
Microsoft
,
Kristina Lerman
University of Southern California
Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
click propensity estimation
counterfactual inference
logged data
unbiased learning-to-rank
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '19 Paper Acceptance Rate84of511submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 93
  Total Citations
  View Citations
- 2,046
  Total Downloads
- Downloads (Last 12 months)308
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Estimating Position Bias without Intrusive Interventions

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Position Bias Estimation for Unbiased Learning to Rank in Personal Search

An experimental comparison of click position-bias models

Addressing Trust Bias for Unbiased Learning-to-Rank