Safe Deep Reinforcement Learning by Verifying Task-Level Properties

by   Enrico Marchesini, et al.

Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL). However, the cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. Such an encoding requires the agent to visit numerous unsafe states to learn a cost-value function to drive the learning process toward safety. Hence, increasing the number of unsafe interactions and decreasing sample efficiency. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric. This metric is computed by verifying task-level properties, shaped as input-output conditions, and it is used as a penalty to bias the policy away from unsafe states without learning an additional value function. We investigate the benefits of using the violation metric in standard Safe DRL benchmarks and robotic mapless navigation tasks. The navigation experiments bridge the gap between Safe DRL and robotics, introducing a framework that allows rapid testing on real robots. Our experiments show that policies trained with the violation penalty achieve higher performance over Safe DRL baselines and significantly reduce the number of visited unsafe states.


page 4

page 7


Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation

Safety is essential for deploying Deep Reinforcement Learning (DRL) algo...

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Though deep reinforcement learning (DRL) has obtained substantial succes...

Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation

We propose a novel benchmark environment for Safe Reinforcement Learning...

ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement Learning

Despite notable results in various fields over the recent years, deep re...

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

Deep reinforcement learning algorithms can learn complex behavioral skil...

AMS-DRL: Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

Safe navigation of drones in the presence of adversarial physical attack...

Safe Deep RL for Intraoperative Planning of Pedicle Screw Placement

Spinal fusion surgery requires highly accurate implantation of pedicle s...

Please sign up or login with your details

Forgot password? Click here to reset