ABSTRACT
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \citeJoachims/etal/17a can provably overcome presentation bias when observation propensities are known, it remains to show how to effectively estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. First, we show how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and show that this data is sufficient for consistent propensity estimation in the position-based model. Second, we propose a new extremum estimator that makes effective use of this data. In an empirical evaluation, we find that the new estimator provides superior propensity estimates in two real-world systems -- Arxiv Full-text Search and Google Drive Search. Beyond these two points, we find that the method is robust to a wide range of settings in simulation studies.
- Aman Agarwal, Soumya Basu, Tobias Schnabel, and Thorsten Joachims. 2017. Effective Evaluation using Logged Bandit Feedback from Multiple Loggers. In Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 687--696. Google ScholarDigital Library
- Aman Agarwal, Ivan Zaitsev, and Thorsten Joachims. 2018. Counterfactual Learning-to-Rank for Additive Metrics and Deep Models. In ICML Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML).Google Scholar
- Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proc. of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR). 385--394. Google ScholarDigital Library
- Ben Carterette and Praveen Chandar. 2018. Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback. In Proc. of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR). 705--714. Google ScholarDigital Library
- Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-Scale Validation and Analysis of Interleaved Search Evaluation. ACM Transactions on Information Systems (TOIS), Vol. 30, 1 (2012), 6:1--6:41. Google ScholarDigital Library
- Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In Proc. of the 18th International Conference on World Wide Web (WWW). 1--10. Google ScholarDigital Library
- Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search .Morgan & Claypool Publishers.Google Scholar
- Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-bias Models. In Proc. of the 1st International Conference on Web Search and Web Data Mining (WSDM). 87--94. Google ScholarDigital Library
- Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), Vol. 39 (1977), 1--38. Issue 1.Google ScholarCross Ref
- Miroslav Dud'i k, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proc. of the 28th International Conference on Machine Learning (ICML). 1097--1104. Google ScholarDigital Library
- Georges E. Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. In Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 331--338. Google ScholarDigital Library
- Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael Taylor, Yi-Min Wang, and Christos Faloutsos. 2009. Click Chain Model in Web Search. In Proc. of the 18th International Conference on World Wide Web (WWW). 11--20. Google ScholarDigital Library
- G. Imbens and D. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences .Cambridge University Press. Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 133--142. Google ScholarDigital Library
- Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately Interpreting Clickthrough Data as Implicit Feedback. In Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 154--161. Google ScholarDigital Library
- Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. 2007. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search. ACM Transactions on Information Systems (TOIS), Vol. 25, 2, Article 7 (April 2007). Google ScholarDigital Library
- Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. 2018. Deep Learning with Logged Bandit Feedback. In 6th International Conference on Learning Representations (ICLR).Google Scholar
- Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the 10th ACM International Conference on Web Search and Data Mining, (WSDM). 781--789. Google ScholarDigital Library
- John Langford, Alexander Strehl, and Jennifer Wortman. 2008. Exploration Scavenging. In Proc. of the 25th International Conference on Machine Learning (ICML). 528--535. Google ScholarDigital Library
- Lihong Li, Shunbao Chen, Jim Kleban, and Ankur Gupta. 2014. Counterfactual Estimation and Optimization of Click Metrics for Search Engines. CoRR, Vol. abs/1403.1891 (2014).Google Scholar
- Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms. In Proc. of the 4th International Conference on Web Search and Web Data Mining (WSDM). 297--306. Google ScholarDigital Library
- Maeve O'Brien and Mark T Keane. 2006. Modeling result--list searching in the World Wide Web: The role of relevance topologies and trust bias. In Proc. of the 28th Annual Conference of the Cognitive Science Society (CogSci). 1--881.Google Scholar
- Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting Clicks: Estimating the Click-through Rate for New Ads. In Proc. of the 16th International Conference on World Wide Web (WWW). 521--530. Google ScholarDigital Library
- Paul R. Rosenbaum and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, Vol. 70, 1 (1983), 41--55.Google ScholarCross Ref
- Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. In Proc. of the 33rd International Conference on Machine Learning (ICML). 1670--1679. Google ScholarDigital Library
- Adith Swaminathan and Thorsten Joachims. 2015a. Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization. Journal of Machine Learning Research (JMLR), Vol. 16 (Sep 2015), 1731--1755. Google ScholarDigital Library
- Adith Swaminathan and Thorsten Joachims. 2015b. The Self-Normalized Estimator for Counterfactual Learning. In Proc. of the 28th International Conference on Neural Information Processing Systems (NIPS). 3231--3239. Google ScholarDigital Library
- Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 115--124. Google ScholarDigital Library
- Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM). 610--618. Google ScholarDigital Library
- Yisong Yue, Rajan Patel, and Hein Roehrig. 2010. Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data. In Proc. of the 19th International Conference on World Wide Web (WWW). 1011--1018. Google ScholarDigital Library
Index Terms
- Estimating Position Bias without Intrusive Interventions
Recommendations
Position Bias Estimation for Unbiased Learning to Rank in Personal Search
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data MiningA well-known challenge in learning from click data is its inherent bias and most notably position bias. Traditional click models aim to extract the ‹query, document› relevance and the estimated bias is usually discarded after relevance is extracted. In ...
An experimental comparison of click position-bias models
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningSearch engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document's position in the results page. This paper ...
Addressing Trust Bias for Unbiased Learning-to-Rank
WWW '19: The World Wide Web ConferenceExisting unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the click incompleteness bias, but usually assume that the clicks are noise-...
Comments