Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity

06/13/2023

∙

Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity. While gradient descent converges linearly under such conditions, it remains an open question whether Nesterov's momentum enjoys accelerated convergence under similar settings and assumptions. In this work, we consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum achieves acceleration in theory for this objective class. We provide two realizations of the problem class, one of which is deep ReLU networks, which –to the best of our knowledge–constitutes this work the first that proves accelerated convergence rate for non-trivial neural network architectures.

READ FULL TEXT

Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity

Sign in with Google

Consider DeepAI Pro