Block Coordinate Descent for Deep Learning: Unified Convergence Guarantees

by   Jinshan Zeng, et al.

Training deep neural networks (DNNs) efficiently is a challenge due to the associated highly nonconvex optimization. Recently, the efficiency of the block coordinate descent (BCD) type methods has been empirically illustrated for DNN training. The main idea of BCD is to decompose the highly composite and nonconvex DNN training problem into several almost separable simple subproblems. However, their convergence property has not been thoroughly studied. In this paper, we establish some unified global convergence guarantees of BCD type methods for a wide range of DNN training models, including but not limited to multilayer perceptrons (MLPs), convolutional neural networks (CNNs) and residual networks (ResNets). This paper nontrivially extends the existing convergence results of nonconvex BCD from the smooth case to the nonsmooth case. Our convergence analysis is built upon the powerful Kurdyka-Łojasiewicz (KL) framework but some new techniques are introduced, including the establishment of the KL property of the objective functions of many commonly used DNNs, where the loss function can be taken as squared, hinge and logistic losses, and the activation function can be taken as rectified linear units (ReLUs), sigmoid and linear link functions. The efficiency of BCD method is also demonstrated by a series of exploratory numerical experiments.


A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

Training deep neural networks (DNNs) efficiently is a challenge due to t...

A Unified Framework for Training Neural Networks

The lack of mathematical tractability of Deep Neural Networks (DNNs) has...

Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

Training deep neural networks (DNNs) is an important and challenging opt...

A Convergence Analysis of Nonlinearly Constrained ADMM in Deep Learning

Efficient training of deep neural networks (DNNs) is a challenge due to ...

0/1 Deep Neural Networks via Block Coordinate Descent

The step function is one of the simplest and most natural activation fun...

Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

We propose a Randomized Progressive Training algorithm (RPT) – a stochas...

Edge2Vec: A High Quality Embedding for the Jigsaw Puzzle Problem

Pairwise compatibility measure (CM) is a key component in solving the ji...

Please sign up or login with your details

Forgot password? Click here to reset