Bounding the expected run-time of nonconvex optimization with early stopping

02/20/2020
by   Thomas Flynn, et al.
0

This work examines the convergence of stochastic gradient-based optimization algorithms that use early stopping based on a validation function. The form of early stopping we consider is that optimization terminates when the norm of the gradient of a validation function falls below a threshold. We derive conditions that guarantee this stopping rule is well-defined, and provide bounds on the expected number of iterations and gradient evaluations needed to meet this criterion. The guarantee accounts for the distance between the training and validation sets, measured with the Wasserstein distance. We develop the approach in the general setting of a first-order optimization algorithm, with possibly biased update directions subject to a geometric drift condition. We then derive bounds on the expected running time for early stopping variants of several algorithms, including stochastic gradient descent (SGD), decentralized SGD (DSGD), and the stochastic variance reduced gradient (SVRG) algorithm. Finally, we consider the generalization properties of the iterate returned by early stopping.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2020

Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms

In this paper, we propose an adaptive stopping rule for kernel-based gra...
research
04/01/2020

Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions

While Stochastic Gradient Descent (SGD) is a rather efficient algorithm ...
research
03/28/2017

Early Stopping without a Validation Set

Early stopping is a widely used technique to prevent poor generalization...
research
10/23/2017

Stability and Generalization of Learning Algorithms that Converge to Global Optima

We establish novel generalization bounds for learning algorithms that co...
research
12/24/2020

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Optimizers that further adjust the scale of gradient, such as Adam, Natu...
research
04/06/2015

Early Stopping is Nonparametric Variational Inference

We show that unconverged stochastic gradient descent can be interpreted ...
research
03/23/2020

A termination criterion for stochastic gradient descent for binary classification

We propose a new, simple, and computationally inexpensive termination te...

Please sign up or login with your details

Forgot password? Click here to reset