Safe Exploration in Continuous Action Spaces

01/26/2018
by   Gal Dalal, et al.
0

We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated. We show how to exploit the typically smooth dynamics of these systems and enable RL algorithms to never violate constraints during learning. Our technique is to directly add to the policy a safety layer that analytically solves an action correction formulation per each state. The novelty of obtaining an elegant closed-form solution is attained due to a linearized model, learned on past trajectories consisting of arbitrary actions. This is to mimic the real-world circumstances where data logs were generated with a behavior policy that is implausible to describe mathematically; such cases render the known safety-aware off-policy methods inapplicable. We demonstrate the efficacy of our approach on new representative physics-based environments, and prevail where reward shaping fails by maintaining zero constraint violations.

READ FULL TEXT
research
07/04/2022

Safe Reinforcement Learning via Confidence-Based Filters

Ensuring safety is a crucial challenge when deploying reinforcement lear...
research
06/29/2023

Safety-Aware Task Composition for Discrete and Continuous Reinforcement Learning

Compositionality is a critical aspect of scalable system design. Reinfor...
research
03/07/2023

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

An emerging field of sequential decision problems is safe Reinforcement ...
research
10/27/2020

Learning to be Safe: Deep RL with a Safety Critic

Safety is an essential component for deploying reinforcement learning (R...
research
08/04/2021

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations

Training-time safety violations have been a major concern when we deploy...
research
02/15/2022

L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning

This paper proposes a new regularization technique for reinforcement lea...
research
07/17/2022

Robust Action Governor for Uncertain Piecewise Affine Systems with Non-convex Constraints and Safe Reinforcement Learning

The action governor is an add-on scheme to a nominal control loop that m...

Please sign up or login with your details

Forgot password? Click here to reset