Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

by   Ziyi Chen, et al.

The gradient descent-ascent (GDA) algorithm has been widely applied to solve nonconvex minimax optimization problems. However, the existing GDA-type algorithms can only find first-order stationary points of the envelope function of nonconvex minimax optimization problems, which does not rule out the possibility to get stuck at suboptimal saddle points. In this paper, we develop Cubic-GDA – the first GDA-type algorithm for escaping strict saddle points in nonconvex-strongly-concave minimax optimization. Specifically, the algorithm uses gradient ascent to estimate the second-order information of the minimax objective function, and it leverages the cubic regularization technique to efficiently escape the strict saddle points. Under standard smoothness assumptions on the objective function, we show that Cubic-GDA admits an intrinsic potential function whose value monotonically decreases in the minimax optimization process. Such a property leads to a desired global convergence of Cubic-GDA to a second-order stationary point at a sublinear rate. Moreover, we analyze the convergence rate of Cubic-GDA in the full spectrum of a gradient dominant-type nonconvex geometry. Our result shows that Cubic-GDA achieves an orderwise faster convergence rate than the standard GDA for a wide spectrum of gradient dominant geometry. Our study bridges minimax optimization with second-order optimization and may inspire new developments along this direction.


page 1

page 2

page 3

page 4


Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave Minimax Problem

We study the smooth minimax optimization problem of the form min_ xmax_ ...

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

The gradient descent-ascent (GDA) algorithm has been widely applied to s...

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Cubic-regularized Newton's method (CR) is a popular algorithm that guara...

Second-order Guarantees of Distributed Gradient Algorithms

We consider distributed smooth nonconvex unconstrained optimization over...

A Dual-Dimer Method for Training Physics-Constrained Neural Networks with Minimax Architecture

Data sparsity is a common issue to train machine learning tools such as ...

Semi-Implicit Hybrid Gradient Methods with Application to Adversarial Robustness

Adversarial examples, crafted by adding imperceptible perturbations to n...

An abstract convergence framework with application to inertial inexact forward–backward methods

In this paper we introduce a novel abstract descent scheme suited for th...

Please sign up or login with your details

Forgot password? Click here to reset