AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

09/17/2021
by   Xiaoxia Wu, et al.
12

We propose a computationally-friendly adaptive learning rate schedule, "AdaLoss", which directly uses the information of the loss function to adjust the stepsize in gradient descent methods. We prove that this schedule enjoys linear convergence in linear regression. Moreover, we provide a linear convergence guarantee over the non-convex regime, in the context of two-layer over-parameterized neural networks. If the width of the first-hidden layer in the two-layer networks is sufficiently large (polynomially), then AdaLoss converges robustly to the global minimum in polynomial time. We numerically verify the theoretical results and extend the scope of the numerical experiments by considering applications in LSTM models for text clarification and policy gradients for control problems.

READ FULL TEXT
research
02/19/2019

Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network

Adaptive gradient methods like AdaGrad are widely used in optimizing neu...
research
04/14/2022

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

We prove linear convergence of gradient descent to a global minimum for ...
research
07/04/2019

Learning One-hidden-layer neural networks via Provable Gradient Descent with Random Initialization

Although deep learning has shown its powerful performance in many applic...
research
11/19/2022

Can Gradient Descent Provably Learn Linear Dynamic Systems?

We study the learning ability of linear recurrent neural networks with g...
research
05/24/2018

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Many tasks in machine learning and signal processing can be solved by mi...
research
02/15/2015

Equilibrated adaptive learning rates for non-convex optimization

Parameter-specific adaptive learning rate methods are computationally ef...
research
02/02/2023

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

We consider the optimisation of large and shallow neural networks via gr...

Please sign up or login with your details

Forgot password? Click here to reset