Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Multi-step methods such as Retrace(λ) and n-step Q-learning have become a crucial component of modern deep reinforcement learning agents. These methods are often evaluated as a part of bigger architectures and their evaluations rarely include enough samples to draw statistically significant conclusions about their performance. This type of methodology makes it difficult to understand how particular algorithmic details of multi-step methods influence learning. In this paper we combine the n-step action-value algorithms Retrace, Q-learning, Tree Backup, Sarsa, and Q(σ) with an architecture analogous to DQN. We test the performance of all these algorithms in the mountain car environment; this choice of environment allows for faster training times and larger sample sizes. We present statistical analyses on the effects of the off-policy correction, the backup length parameter n, and the update frequency of the target network on the performance of these algorithms. Our results show that (1) using off-policy correction can have an adverse effect on the performance of Sarsa and Q(σ); (2) increasing the backup length n consistently improved performance across all the different algorithms; and (3) the performance of Sarsa and Q-learning was more robust to the effect of the target network update frequency than the performance of Tree Backup, Q(σ), and Retrace in this particular task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavi...
research
06/23/2020

The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning

Multi-step (also called n-step) methods in reinforcement learning (RL) h...
research
01/13/2022

Criticality-Based Varying Step-Number Algorithm for Reinforcement Learning

In the context of reinforcement learning we introduce the concept of cri...
research
11/07/2016

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Instability and variability of Deep Reinforcement Learning (DRL) algorit...
research
06/21/2018

How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments

Consistently checking the statistical significance of experimental resul...
research
06/02/2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

Off-policy prediction – learning the value function for one policy from ...
research
08/25/2020

t-Soft Update of Target Network for Deep Reinforcement Learning

This paper proposes a new robust update rule of the target network for d...

Please sign up or login with your details

Forgot password? Click here to reset