ABSTRACT
Inverse Propensity Score estimator (IPS) is a basic, unbiased, off-policy evaluation technique to measure the impact of a user-interactive system without serving live traffic. We present our work on applying IPS to real-world settings by addressing some practical challenges, thereby enabling successful policy evaluation. In particular, we show that off-policy evaluation can be impossible in the absence of a complete context and we describe a systematic way of defining the context.
- J. Langford, A. Strehl, and J. Wortman. 2008. Exploration Scavenging. In ICML. Google ScholarDigital Library
- L. Li, S. Chen, J. Kleban, and A. Gupta. 2015 a. Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study WWW '15 Companion. Google ScholarDigital Library
- L. Li, J. Young Kim, and I. Zitouni. 2015 b. Toward Predicting the Outcome of an A/B Experiment for Search Relevance WSDM. Google ScholarDigital Library
- A. Strehl, J. Langford, L. Li, and S. Kakade. 2010. Learning from Logged Implicit Exploration Data. NIPS. Google ScholarDigital Library
- A. Swaminathan and T. Joachims. 2015. The self-normalized estimator for counterfactual learning NIPS. Google ScholarDigital Library
- J. Tang, A. Salem, and L. Huan. 2014. Feature selection for classification: A review. In Data Classification: Algorithms and Applications.Google Scholar
Index Terms
- Handling Confounding for Realistic Off-Policy Evaluation
Recommendations
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data MiningIn real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance ...
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
WWW '24: Proceedings of the ACM on Web Conference 2024Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we ...
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
WWW '24: Proceedings of the ACM on Web Conference 2024We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a policy selects multi-dimensional actions known as slates. This problem is widespread in recommender systems, search engines, marketing, to medical applications, ...
Comments