Self-Tuning Deep Reinforcement Learning

02/28/2020
by   Tom Zahavy, et al.
20

Reinforcement learning (RL) algorithms often require expensive manual or automated hyperparameter searches in order to perform well on a new domain. This need is particularly acute in modern deep RL architectures which often incorporate many modules and multiple loss functions. In this paper, we take a step towards addressing this issue by using metagradients (Xu et al., 2018) to tune these hyperparameters via differentiable cross validation, whilst the agent interacts with and learns from the environment. We present the Self-Tuning Actor Critic (STAC) which uses this process to tune the hyperparameters of the usual loss function of the IMPALA actor critic agent(Espeholt et. al., 2018), to learn the hyperparameters that define auxiliary loss functions, and to balance trade offs in off policy learning by introducing and adapting the hyperparameters of a novel leaky V-trace operator. The method is simple to use, sample efficient and does not require significant increase in compute. Ablative studies show that the overall performance of STAC improves as we adapt more hyperparameters. When applied to 57 games on the Atari 2600 environment over 200 million frames our algorithm improves the median human normalized score of the baseline from 243

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Towards Automatic Actor-Critic Solutions to Continuous Control

Model-free off-policy actor-critic methods are an efficient solution to ...
research
02/05/2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning

We propose Adversarially Trained Actor Critic (ATAC), a new model-free a...
research
04/21/2022

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Actor-critic algorithms that make use of distributional policy evaluatio...
research
09/18/2019

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Recent successes of Reinforcement Learning (RL) allow an agent to learn ...
research
01/26/2022

Hyperparameter Tuning for Deep Reinforcement Learning Applications

Reinforcement learning (RL) applications, where an agent can simply lear...
research
11/11/2022

Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization

Advances in reinforcement learning (RL) often rely on massive compute re...
research
09/17/2023

Using Reinforcement Learning to Simplify Mealtime Insulin Dosing for People with Type 1 Diabetes: In-Silico Experiments

People with type 1 diabetes (T1D) struggle to calculate the optimal insu...

Please sign up or login with your details

Forgot password? Click here to reset