Efficient Counterfactual Learning from Bandit Feedback

09/10/2018
by   Yusuke Narita, et al.
0

What is the most statistically efficient way to do off-policy evaluation and optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We also apply our estimators to improve online advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark.

READ FULL TEXT
research
02/03/2022

Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits

Methods for offline A/B testing and counterfactual learning are seeing r...
research
12/04/2022

Counterfactual Learning with General Data-generating Policies

Off-policy evaluation (OPE) attempts to predict the performance of count...
research
09/18/2019

Learning from Bandit Feedback: An Overview of the State-of-the-art

In machine learning we often try to optimise a decision rule that would ...
research
08/17/2020

A Large-scale Open Dataset for Bandit Algorithms

We build and publicize the Open Bandit Dataset and Pipeline to facilitat...
research
02/07/2019

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

We address a practical problem ubiquitous in modern industry, in which a...
research
11/06/2018

CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning

The ability to perform offline A/B-testing and off-policy learning using...
research
02/20/2020

Safe Counterfactual Reinforcement Learning

We develop a method for predicting the performance of reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset