High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

06/28/2021
βˆ™
by   Ashok Cutkosky, et al.
βˆ™
0
βˆ™

We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clipping, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded 𝔭th moments for some π”­βˆˆ(1,2]. We then consider the case of second-order smooth losses, which to our knowledge have not been studied in this setting, and again obtain high-probability bounds for any 𝔭. Moreover, our results hold for arbitrary smooth norms, in contrast to the typical SGD analysis which requires a Hilbert space norm. Further, we show that after a suitable "burn-in" period, the objective value will monotonically decrease for every iteration until a critical point is identified, which provides intuition behind the popular practice of learning rate "warm-up" and also yields a last-iterate guarantee.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 07/25/2023

High Probability Analysis for Non-Convex Stochastic Optimization with Clipping

Gradient clipping is a commonly used technique to stabilize the training...
research
βˆ™ 05/21/2020

Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

In this paper, we propose a new accelerated stochastic first-order metho...
research
βˆ™ 02/17/2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular a...
research
βˆ™ 02/10/2020

Stochastic Online Optimization using Kalman Recursion

We study the Extended Kalman Filter in constant dynamics, offering a bay...
research
βˆ™ 04/06/2022

High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize

In this paper, we propose a new, simplified high probability analysis of...
research
βˆ™ 02/09/2020

Momentum Improves Normalized SGD

We provide an improved analysis of normalized SGD showing that adding mo...
research
βˆ™ 12/04/2020

Non-monotone risk functions for learning

In this paper we consider generalized classes of potentially non-monoton...

Please sign up or login with your details

Forgot password? Click here to reset