Dropout Q-Functions for Doubly Efficient Reinforcement Learning

10/05/2021
by   Takuya Hiraoka, et al.
7

Randomized ensemble double Q-learning (REDQ) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called Dr.Q, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that Dr.Q is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.

READ FULL TEXT

page 7

page 9

page 14

page 16

page 17

page 20

research
01/15/2022

Recursive Least Squares Advantage Actor-Critic Algorithms

As an important algorithm in deep reinforcement learning, advantage acto...
research
06/13/2017

On Optimistic versus Randomized Exploration in Reinforcement Learning

We discuss the relative merits of optimistic and randomized approaches t...
research
05/08/2022

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

Actor-critic Reinforcement Learning (RL) algorithms have achieved impres...
research
08/03/2021

MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning

Model-based reinforcement learning is a widely accepted solution for sol...
research
02/23/2022

Consistent Dropout for Policy Gradient Reinforcement Learning

Dropout has long been a staple of supervised learning, but is rarely use...
research
01/16/2023

A Computationally Efficient Vectorized Implementation of Localizing Gradient Damage Method in MATLAB

In this work, a recently developed fracture modeling method called local...
research
11/09/2021

Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers

We consider interactive learning in the realizable setting and develop a...

Please sign up or login with your details

Forgot password? Click here to reset