Interpretable Multi Time-scale Constraints in Model-free Deep Reinforcement Learning for Autonomous Driving

by   Gabriel Kalweit, et al.

In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a list of constraints. Classical methods based on reward shaping, i.e. a weighted combination of different objectives in the reward signal, or Lagrangian methods, including constraints in the loss function, have no guarantees that the agent satisfies the constraints at all points in time and lack in interpretability. When a discrete policy is extracted from an action-value function, safe actions can be ensured by restricting the action space at maximization, but can lead to sub-optimal solutions among feasible alternatives. In this work, we propose Multi Time-scale Constrained DQN, a novel algorithm restricting the action space directly in the Q-update to learn the optimal Q-function for the constrained MDP and the corresponding safe policy. In addition to single-step constraints referring only to the next action, we introduce a formulation for approximate multi-step constraints under the current target policy based on truncated value-functions to enhance interpretability. We compare our algorithm to reward shaping and Lagrangian methods in the application of high-level decision making in autonomous driving, considering constraints for safety, keeping right and comfort. We train our agent in the open-source simulator SUMO and on the real HighD data set.


page 1

page 5

page 6


Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

During initial iterations of training in most Reinforcement Learning (RL...

Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

In the trial-and-error mechanism of reinforcement learning (RL), a notor...

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

In this paper, we investigate a novel safe reinforcement learning proble...

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requireme...

Deep Inverse Q-learning with Constraints

Popular Maximum Entropy Inverse Reinforcement Learning approaches requir...

Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

Well-established optimization-based methods can guarantee an optimal tra...

Dynamic Input for Deep Reinforcement Learning in Autonomous Driving

In many real-world decision making problems, reaching an optimal decisio...

Please sign up or login with your details

Forgot password? Click here to reset