Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

10/28/2021
by   Andrew Bennett, et al.
10

In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived under the assumption of a perfect Markov decision process (MDP) model. Here we tackle this by considering off-policy evaluation in a partially observed MDP (POMDP). Specifically, we consider estimating the value of a given target policy in a POMDP given trajectories with only partial state observations generated by a different and unknown policy that may depend on the unobserved state. We tackle two questions: what conditions allow us to identify the target policy value from the observed data and, given identification, how to best estimate it. To answer these, we extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible by the existence of so-called bridge functions. We then show how to construct semiparametrically efficient estimators in these settings. We term the resulting framework proximal reinforcement learning (PRL). We demonstrate the benefits of PRL in an extensive simulation study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2020

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Off-policy evaluation (OPE) in reinforcement learning is an important pr...
research
02/08/2016

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

We present a data-efficient reinforcement learning algorithm resistant t...
research
10/19/2021

Stateful Offline Contextual Policy Evaluation and Learning

We study off-policy evaluation and learning from sequential data in a st...
research
05/19/2020

Riemannian Proximal Policy Optimization

In this paper, We propose a general Riemannian proximal optimization alg...
research
09/29/2022

Blessing from Experts: Super Reinforcement Learning in Confounded Environments

We introduce super reinforcement learning in the batch setting, which ta...
research
09/21/2022

Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models

We study the problem of off-policy evaluation (OPE) for episodic Partial...
research
10/24/2021

Off-Policy Evaluation in Partially Observed Markov Decision Processes

We consider off-policy evaluation of dynamic treatment rules under the a...

Please sign up or login with your details

Forgot password? Click here to reset