Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

07/19/2021
by   Haoran Xu, et al.
0

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

We consider the offline constrained reinforcement learning (RL) problem,...
research
06/05/2023

Survival Instinct in Offline Reinforcement Learning

We present a novel observation about the behavior of offline reinforceme...
research
02/10/2022

SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition

Though many reinforcement learning (RL) problems involve learning polici...
research
09/18/2023

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

Safe Reinforcement Learning (RL) aims to find a policy that achieves hig...
research
08/10/2022

Robust Reinforcement Learning using Offline Data

The goal of robust reinforcement learning (RL) is to learn a policy that...
research
01/28/2023

SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

Offline safe RL is of great practical relevance for deploying agents in ...
research
07/16/2022

BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion

We utilize an offline reinforcement learning (RL) model for sequential t...

Please sign up or login with your details

Forgot password? Click here to reset