Budgeting Counterfactual for Offline RL

07/12/2023
by   Yao Liu, et al.
0

The main challenge of offline reinforcement learning, where data is limited, arises from a sequence of counterfactual reasoning dilemmas within the realm of potential actions: What if we were to choose a different course of action? These circumstances frequently give rise to extrapolation errors, which tend to accumulate exponentially with the problem horizon. Hence, it becomes crucial to acknowledge that not all decision steps are equally important to the final outcome, and to budget the number of counterfactual decisions a policy make in order to control the extrapolation. Contrary to existing approaches that use regularization on either the policy or value function, we propose an approach to explicitly bound the amount of out-of-distribution actions during training. Specifically, our method utilizes dynamic programming to decide where to extrapolate and where not to, with an upper bound on the decisions different from behavior policy. It balances between the potential for improvement from taking out-of-distribution actions and the risk of making errors due to extrapolation. Theoretically, we justify our method by the constrained optimality of the fixed point solution to our Q updating rules. Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.

READ FULL TEXT

page 8

page 18

research
08/12/2022

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal poli...
research
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
research
06/09/2022

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
06/01/2022

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

We introduce an offline reinforcement learning (RL) algorithm that expli...
research
03/17/2021

Regularized Behavior Value Estimation

Offline reinforcement learning restricts the learning process to rely on...
research
03/28/2023

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Most offline reinforcement learning (RL) methods suffer from the trade-o...
research
04/22/2020

Optimization Approaches for Counterfactual Risk Minimization with Continuous Actions

Counterfactual reasoning from logged data has become increasingly import...

Please sign up or login with your details

Forgot password? Click here to reset