Handling Confounding for Realistic Off-Policy Evaluation

Authors:
Saurabh Sohoney

Amazon, Bangalore, India

Amazon, Bangalore, India
View Profile

,
Nikita Prabhu

Amazon, Bangalore, India

Amazon, Bangalore, India
View Profile

,
Vineet Chaoji

Amazon, Bangalore, India

Amazon, Bangalore, India
View Profile

WWW '18: Companion Proceedings of the The Web Conference 2018April 2018Pages 33–34https://doi.org/10.1145/3184558.3186915

Published:23 April 2018Publication History

WWW '18: Companion Proceedings of the The Web Conference 2018

Pages 33–34

ABSTRACT

Inverse Propensity Score estimator (IPS) is a basic, unbiased, off-policy evaluation technique to measure the impact of a user-interactive system without serving live traffic. We present our work on applying IPS to real-world settings by addressing some practical challenges, thereby enabling successful policy evaluation. In particular, we show that off-policy evaluation can be impossible in the absence of a complete context and we describe a systematic way of defining the context.

References

J. Langford, A. Strehl, and J. Wortman. 2008. Exploration Scavenging. In ICML. Google ScholarDigital Library
L. Li, S. Chen, J. Kleban, and A. Gupta. 2015 a. Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study WWW '15 Companion. Google ScholarDigital Library
L. Li, J. Young Kim, and I. Zitouni. 2015 b. Toward Predicting the Outcome of an A/B Experiment for Search Relevance WSDM. Google ScholarDigital Library
A. Strehl, J. Langford, L. Li, and S. Kakade. 2010. Learning from Logged Implicit Exploration Data. NIPS. Google ScholarDigital Library
A. Swaminathan and T. Joachims. 2015. The self-normalized estimator for counterfactual learning NIPS. Google ScholarDigital Library
J. Tang, A. Salem, and L. Huan. 2014. Feature selection for classification: A review. In Data Classification: Algorithms and Applications.Google Scholar

Index Terms

Handling Confounding for Realistic Off-Policy Evaluation
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
  2. Modeling and simulation
    1. Model development and analysis
2. General and reference
  1. Cross-computing tools and techniques

Recommendations

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance ...
Read More
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
WWW '24: Proceedings of the ACM on Web Conference 2024

Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we ...
Read More
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
WWW '24: Proceedings of the ACM on Web Conference 2024

We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a policy selects multi-dimensional actions known as slates. This problem is widespread in recommender systems, search engines, marketing, to medical applications, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
General Chairs:
Pierre-Antoine Champin
Université Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, CNRS, LIRIS, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 23 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
a/b tests
action propensity estimation
confounding
inverse propensity score
off-policy evaluation
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 272
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Handling Confounding for Realistic Off-Policy Evaluation

WWW '18: Companion Proceedings of the The Web Conference 2018

ABSTRACT

References

Cited By

Index Terms

Recommendations

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Off-Policy Evaluation for Large Action Spaces via Policy Convolution

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction