A Bayesian Approach to Robust Reinforcement Learning

05/20/2019
by   Esther Derman, et al.
0

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages safe exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBE-based algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBE-based strategy leads to a better trade-off between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQN-URBE algorithm can adapt significantly faster to changing dynamics online compared to existing robust techniques with fixed uncertainty sets.

READ FULL TEXT

page 7

page 8

research
03/11/2018

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive an optimal behavior that ac...
research
08/22/2023

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

The robust constrained Markov decision process (RCMDP) is a recent task-...
research
02/14/2022

Robust Policy Learning over Multiple Uncertainty Sets

Reinforcement learning (RL) agents need to be robust to variations in sa...
research
09/03/2023

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

In robust Markov decision processes (RMDPs), it is assumed that the rewa...
research
05/31/2022

Robust Anytime Learning of Markov Decision Processes

Markov decision processes (MDPs) are formal models commonly used in sequ...
research
07/15/2023

Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation

Robustness has been extensively studied in reinforcement learning (RL) t...
research
01/23/2019

Robust temporal difference learning for critical domains

We present a new Q-function operator for temporal difference (TD) learni...

Please sign up or login with your details

Forgot password? Click here to reset