CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning

11/06/2018
by   Yi Su, et al.
16

The ability to perform offline A/B-testing and off-policy learning using logged contextual bandit feedback is highly desirable in a broad range of applications, including recommender systems, search engines, ad placement, and personalized health care. Both offline A/B-testing and off-policy learning require a counterfactual estimator that evaluates how some new policy would have performed, if it had been used instead of the logging policy. This paper proposes a new counterfactual estimator - called Continuous Adaptive Blending (CAB) - for this policy evaluation problem that combines regression and weighting approaches for an effective bias/variance trade-off. It can be substantially less biased than clipped Inverse Propensity Score weighting and the Direct Method, and it can have less variance compared with Doubly Robust and IPS estimators. Experimental results show that CAB provides excellent and reliable estimation accuracy compared to other blended estimators, and - unlike the SWITCH estimator - is sub-differentiable such that it can be used for learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2022

Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits

Methods for offline A/B testing and counterfactual learning are seeing r...
research
09/10/2018

Efficient Counterfactual Learning from Bandit Feedback

What is the most statistically efficient way to do off-policy evaluation...
research
01/22/2018

Offline A/B testing for Recommender Systems

Before A/B testing online a new version of a recommender system, it is u...
research
03/11/2023

Uncertainty-Aware Off-Policy Learning

Off-policy learning, referring to the procedure of policy optimization w...
research
05/14/2023

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

We study off-policy evaluation (OPE) of contextual bandit policies for l...
research
08/22/2018

Genie: An Open Box Counterfactual Policy Estimator for Optimizing Sponsored Search Marketplace

In this paper, we propose an offline counterfactual policy estimation fr...
research
08/31/2021

Evaluating the Robustness of Off-Policy Evaluation

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates...

Please sign up or login with your details

Forgot password? Click here to reset