AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

04/21/2020
by   Wenjie Li, et al.
0

Although adaptive optimization algorithms such as Adam show fast convergence in many machine learning tasks, this paper identifies a problem of Adam by analyzing its performance in a simple non-convex synthetic problem, showing that Adam's fast convergence would possibly lead the algorithm to local minimums. To address this problem, we improve Adam by proposing a novel adaptive gradient descent algorithm named AdaX. Unlike Adam that ignores the past gradients, AdaX exponentially accumulates the long-term gradient information in the past during training, to adaptively tune the learning rate. We thoroughly prove the convergence of AdaX in both the convex and non-convex settings. Extensive experiments show that AdaX outperforms Adam in various tasks of computer vision and natural language processing and can catch up with Stochastic Gradient Descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2018

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Stochastic gradient descent is the method of choice for large scale opti...
research
02/19/2021

Local Convergence of Adaptive Gradient Descent Optimizers

Adaptive Moment Estimation (ADAM) is a very popular training algorithm f...
research
07/12/2023

Provably Faster Gradient Descent via Long Steps

This work establishes provably faster convergence rates for gradient des...
research
09/12/2023

ELRA: Exponential learning rate adaption gradient descent optimization method

We present a novel, fast (exponential rate adaption), ab initio (hyper-p...
research
04/27/2022

Can deep learning match the efficiency of human visual long-term memory in storing object details?

Humans have a remarkably large capacity to store detailed visual informa...
research
02/15/2015

Equilibrated adaptive learning rates for non-convex optimization

Parameter-specific adaptive learning rate methods are computationally ef...
research
12/04/2019

Exponential convergence of Sobolev gradient descent for a class of nonlinear eigenproblems

We propose to use the Łojasiewicz inequality as a general tool for analy...

Please sign up or login with your details

Forgot password? Click here to reset