AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations

07/04/2021
by   Hongwei Zhang, et al.
0

Adaptive optimization methods have been widely used in deep learning. They scale the learning rates adaptively according to the past gradient, which has been shown to be effective to accelerate the convergence. However, they suffer from poor generalization performance compared with SGD. Recent studies point that smoothing exponential gradient noise leads to generalization degeneration phenomenon. Inspired by this, we propose AdaL, with a transformation on the original gradient. AdaL accelerates the convergence by amplifying the gradient in the early stage, as well as dampens the oscillation and stabilizes the optimization by shrinking the gradient later. Such modification alleviates the smoothness of gradient noise, which produces better generalization performance. We have theoretically proved the convergence of AdaL and demonstrated its effectiveness on several benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2021

A New Adaptive Gradient Method with Gradient Decomposition

Adaptive gradient methods, especially Adam-type methods (such as Adam, A...
research
06/10/2019

Adaptively Preconditioned Stochastic Gradient Langevin Dynamics

Stochastic Gradient Langevin Dynamics infuses isotropic gradient noise t...
research
10/12/2020

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

It is not clear yet why ADAM-alike adaptive gradient algorithms suffer f...
research
05/23/2018

Predictive Local Smoothness for Stochastic Gradient Methods

Stochastic gradient methods are dominant in nonconvex optimization espec...
research
02/26/2019

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Adaptive optimization methods such as AdaGrad, RMSprop and Adam have bee...
research
06/12/2020

Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs

Adaptive gradient methods have attracted much attention of machine learn...
research
02/26/2020

Disentangling Adaptive Gradient Methods from Learning Rates

We investigate several confounding factors in the evaluation of optimiza...

Please sign up or login with your details

Forgot password? Click here to reset