Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

11/28/2019
by   Qi Zhou, et al.
18

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

READ FULL TEXT

page 8

page 9

page 10

page 18

page 21

research
06/25/2019

Uncertainty-aware Model-based Policy Optimization

Model-based reinforcement learning has the potential to be more sample e...
research
05/30/2018

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Model-based reinforcement learning (RL) algorithms can attain excellent ...
research
12/16/2021

Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic

Model-based reinforcement learning algorithms, which aim to learn a mode...
research
07/04/2020

Bidirectional Model-based Policy Optimization

Model-based reinforcement learning approaches leverage a forward dynamic...
research
05/15/2019

Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction

Model-free reinforcement learning based methods such as Proximal Policy ...
research
10/15/2021

On-Policy Model Errors in Reinforcement Learning

Model-free reinforcement learning algorithms can compute policy gradient...

Please sign up or login with your details

Forgot password? Click here to reset