Towards Characterizing Divergence in Deep Q-Learning

03/21/2019
by   Joshua Achiam, et al.
16

Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation. Prior work has demonstrated that together these can lead to divergence in Q-learning algorithms, but the conditions under which divergence occurs are not well-understood. In this note, we give a simple analysis based on a linear approximation to the Q-value updates, which we believe provides insight into divergence under the deadly triad. The central point in our analysis is to consider when the leading order approximation to the deep-Q update is or is not a contraction in the sup norm. Based on this analysis, we develop an algorithm which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target networks, adaptive gradient optimizers, or using multiple Q functions). We demonstrate that our algorithm performs above or near state-of-the-art on standard MuJoCo benchmarks from the OpenAI Gym.

READ FULL TEXT

page 8

page 21

research
02/20/2023

Backstepping Temporal Difference Learning

Off-policy learning ability is an important feature of reinforcement lea...
research
11/12/2021

AWD3: Dynamic Reduction of the Estimation Bias

Value-based deep Reinforcement Learning (RL) algorithms suffer from the ...
research
05/25/2019

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art re...
research
08/01/2023

Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example

A very simple unidimensional function with Lipschitz continuous gradient...
research
07/01/2020

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learni...
research
07/04/2017

Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE

We propose a number of new algorithms for learning deep energy models an...
research
09/13/2022

Rényi Divergence Deep Mutual Learning

This paper revisits an incredibly simple yet exceedingly effective compu...

Please sign up or login with your details

Forgot password? Click here to reset