Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training

In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2023

Interpolation property of shallow neural networks

We study the geometry of global minima of the loss landscape of overpara...
research
05/15/2020

A New Activation Function for Training Deep Neural Networks to Avoid Local Minimum

Activation functions have a major role to play and hence are very import...
research
06/28/2023

Empirical Loss Landscape Analysis of Neural Network Activation Functions

Activation functions play a significant role in neural network design by...
research
02/23/2022

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

We study the loss landscape of training problems for deep artificial neu...
research
09/27/2018

Towards the optimal construction of a loss function without spurious local minima for solving quadratic equations

The problem of finding a vector x which obeys a set of quadratic equatio...
research
06/29/2023

Why Shallow Networks Struggle with Approximating and Learning High Frequency: A Numerical Study

In this work, a comprehensive numerical study involving analysis and exp...

Please sign up or login with your details

Forgot password? Click here to reset