Learning Expected Emphatic Traces for Deep RL

07/12/2021
by   Ray Jiang, et al.
0

Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods. When combined with function approximation, such as neural networks, this combination is known as the deadly triad and is potentially unstable. Recently, it has been shown that stability and good performance at scale can be achieved by combining emphatic weightings and multi-step updates. This approach, however, is generally limited to sampling complete trajectories in order, to compute the required emphatic weighting. In this paper we investigate how to combine emphatic weightings with non-sequential, off-line data sampled from a replay buffer. We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed n-step TD learning algorithm to learn the required emphatic weighting. We show that these state weightings reduce variance compared with prior approaches, while providing convergence guarantees. We tested the approach at scale on Atari 2600 video games, and observed that the new X-ETD(n) agent improved over baseline agents, highlighting both the scalability and broad applicability of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2021

Large Batch Experience Replay

Several algorithms have been proposed to sample non-uniformly the replay...
research
02/20/2023

Understanding the effect of varying amounts of replay per step

Model-based reinforcement learning uses models to plan, where the predic...
research
06/16/2023

Temporal Difference Learning with Experience Replay

Temporal-difference (TD) learning is widely regarded as one of the most ...
research
02/08/2019

Source Traces for Temporal Difference Learning

This paper motivates and develops source traces for temporal difference ...
research
10/28/2021

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

This paper proposes a method for prioritizing the replay experience refe...
research
08/09/2022

Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2

One approach to meet the challenges of deep lifelong reinforcement learn...
research
03/29/2022

Topological Experience Replay

State-of-the-art deep Q-learning methods update Q-values using state tra...

Please sign up or login with your details

Forgot password? Click here to reset