Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

01/15/2021
by   Xinyue Chen, et al.
37

Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based method, and with less wall-clock run time. REDQ has three carefully integrated ingredients which allow it to achieve its high performance: (i) a UTD ratio >> 1; (ii) an ensemble of Q functions; (iii) in-target minimization across a random subset of Q functions from the ensemble. Through carefully designed experiments, we provide a detailed analysis of REDQ and related model-free algorithms. To our knowledge, REDQ is the first successful model-free DRL algorithm for continuous-action spaces using a UTD ratio >> 1.

READ FULL TEXT

page 7

page 21

page 23

page 25

research
11/17/2021

Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance

Recently, Truncated Quantile Critics (TQC), using distributional represe...
research
10/12/2020

Local Search for Policy Iteration in Continuous Control

We present an algorithm for local, regularized, policy improvement in re...
research
03/14/2021

Progressive residual learning for single image dehazing

The recent physical model-free dehazing methods have achieved state-of-t...
research
08/28/2020

On the model-based stochastic value gradient for continuous reinforcement learning

Model-based reinforcement learning approaches add explicit domain knowle...
research
04/11/2023

Real-Time Model-Free Deep Reinforcement Learning for Force Control of a Series Elastic Actuator

Many state-of-the art robotic applications utilize series elastic actuat...
research
10/28/2019

Asynchronous Methods for Model-Based Reinforcement Learning

Significant progress has been made in the area of model-based reinforcem...
research
11/30/2020

Model-based controlled learning of MDP policies with an application to lost-sales inventory control

Recent literature established that neural networks can represent good MD...

Please sign up or login with your details

Forgot password? Click here to reset