Normalized gradient flow optimization in the training of ReLU artificial neural networks

by   Simon Eberle, et al.

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular choice of such a one-dimensional activation function is the rectified linear unit (ReLU) activation function which maps a real number to its positive part ℝ∋ x ↦max{ x, 0 }∈ℝ. In this article we propose and analyze a modified variant of the standard training procedure of such ReLU ANNs in the sense that we propose to restrict the negative gradient flow dynamics to a large submanifold of the ANN parameter space, which is a strict C^∞-submanifold of the entire ANN parameter space that seems to enjoy better regularity properties than the entire ANN parameter space but which is also sufficiently large and sufficiently high dimensional so that it can represent all ANN realization functions that can be represented through the entire ANN parameter space. In the special situation of shallow ANNs with just one-dimensional ANN layers we also prove for every Lipschitz continuous target function that every gradient flow trajectory on this large submanifold of the ANN parameter space is globally bounded. For the standard gradient flow on the entire ANN parameter space with Lipschitz continuous target functions it remains an open problem of research to prove or disprove the global boundedness of gradient flow trajectories even in the situation of shallow ANNs with just one-dimensional ANN layers.


page 1

page 2

page 3

page 4


Piecewise Linear Functions Representable with Infinite Width Shallow ReLU Neural Networks

This paper analyzes representations of continuous piecewise linear funct...

Reducing Parameter Space for Neural Network Training

For neural networks (NNs) with rectified linear unit (ReLU) or binary ac...

Implementation of a language driven Backpropagation algorithm

Inspired by the importance of both communication and feedback on errors ...

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Gradient descent (GD) type optimization schemes are the standard methods...

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

In this article we study fully-connected feedforward deep ReLU ANNs with...

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

Many mathematical convergence results for gradient descent (GD) based al...

Please sign up or login with your details

Forgot password? Click here to reset