Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

02/23/2023
by   Vincent Liu, et al.
1

In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical for policy evaluation, but existing estimators that reuse old data introduce large bias such that we can not obtain a valid confidence interval. Inspired from a related field called survey sampling, we introduce a variant of the doubly robust (DR) estimator, called the regression-assisted DR estimator, that can incorporate the past data without introducing a large bias. The estimator unifies several existing off-policy policy evaluation methods and improves on them with the use of auxiliary information and a regression approach. We prove that the new estimator is asymptotically unbiased, and provide a consistent variance estimator to a construct a large sample confidence interval. Finally, we empirically show that the new estimator improves estimation for the current and future policy values, and provides a tight and valid interval estimation in several nonstationary recommendation environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2018

More Robust Doubly Robust Off-policy Evaluation

We study the problem of off-policy evaluation (OPE) in reinforcement lea...
research
12/18/2021

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

We consider the off-policy evaluation (OPE) problem in contextual bandit...
research
08/27/2023

Distributional Off-Policy Evaluation for Slate Recommendations

Recommendation strategies are typically evaluated by using previously lo...
research
07/22/2019

Doubly robust off-policy evaluation with shrinkage

We design a new family of estimators for off-policy evaluation in contex...
research
05/16/2016

Off-policy evaluation for slate recommendation

This paper studies the evaluation of policies that recommend an ordered ...
research
02/06/2021

Bootstrapping Statistical Inference for Off-Policy Evaluation

Bootstrapping provides a flexible and effective approach for assessing t...
research
06/06/2022

Markovian Interference in Experiments

We consider experiments in dynamical systems where interventions on some...

Please sign up or login with your details

Forgot password? Click here to reset