Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

05/26/2022
by   Miao Lu, et al.
6

We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such a dataset is confounded in the sense that the latent state simultaneously affects the action and the observation, which is prohibitive for existing offline RL algorithms. To this end, we propose the Proxy variable Pessimistic Policy Optimization () algorithm, which addresses the confounding bias and the distributional shift between the optimal and behavior policies in the context of general function approximation. At the core of is a coupled sequence of pessimistic confidence regions constructed via proximal causal inference, which is formulated as minimax estimation. Under a partial coverage assumption on the confounded dataset, we prove that achieves a n^-1/2-suboptimality, where n is the number of trajectories in the dataset. To our best knowledge, is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations

Despite the success of reinforcement learning (RL) for Markov decision p...
research
02/19/2021

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

In offline reinforcement learning (RL) an optimal policy is learnt solel...
research
01/12/2023

Safe Policy Improvement for POMDPs via Finite-State Controllers

We study safe policy improvement (SPI) for partially observable Markov d...
research
09/10/2021

Projected State-action Balancing Weights for Offline Reinforcement Learning

Offline policy evaluation (OPE) is considered a fundamental and challeng...
research
12/28/2022

Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation

Offline reinforcement learning (RL) concerns pursuing an optimal policy ...
research
02/24/2023

Provably Efficient Neural Offline Reinforcement Learning via Perturbed Rewards

We propose a novel offline reinforcement learning (RL) algorithm, namely...
research
12/23/2022

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Motivated by the human-machine interaction such as training chatbots for...

Please sign up or login with your details

Forgot password? Click here to reset