The Multilinear Structure of ReLU Networks

by   Thomas Laurent, et al.

We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space, and as a consequence, optima of such networks generically occur in non-differentiable regions of parameter space. Any understanding of such networks must therefore carefully take into account their non-smooth nature. We show how to use techniques from nonsmooth analysis to study these non-differentiable loss surfaces. Our analysis focuses on three different scenarios: (1) a deep linear network with hinge loss and arbitrary data, (2) a one-hidden layer network with leaky ReLUs and linearly separable data, and (3) a one-hidden layer network with ReLU nonlinearities and linearly separable data. We show that all local minima are global minima in the first two scenarios. A bifurcation occurs when passing from the second to the the third scenario, in that ReLU networks do have non-optimal local minima. We provide a complete description of such sub-optimal solutions. We conclude by investigating the extent to which these phenomena do, or do not, persist when passing to the multiclass context.


page 1

page 2

page 3

page 4


Understanding Global Loss Landscape of One-hidden-layer ReLU Neural Networks

For one-hidden-layer ReLU networks, we show that all local minima are gl...

Understanding Global Loss Landscape of One-hidden-layer ReLU Networks, Part 2: Experiments and Analysis

The existence of local minima for one-hidden-layer ReLU networks has bee...

Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions

In this paper, we analyze the landscape of the true loss of a ReLU neura...

Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape

We study the loss landscape of two-layer mildly overparameterized ReLU n...

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

The training process of ReLU neural networks often exhibits complicated ...

Spurious Local Minima are Common in Two-Layer ReLU Neural Networks

We consider the optimization problem associated with training simple ReL...

Exponentially vanishing sub-optimal local minima in multilayer neural networks

Background: Statistical mechanics results (Dauphin et al. (2014); Chorom...

Please sign up or login with your details

Forgot password? Click here to reset