A Principle of Least Action for the Training of Neural Networks

09/17/2020
by   Skander Karkar, et al.
0

Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network's behavior through its displacements, we show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: finding neural networks which solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.

READ FULL TEXT

page 20

page 21

research
11/02/2022

Instance-Dependent Generalization Bounds via Optimal Transport

Existing generalization bounds fail to explain crucial factors that driv...
research
12/03/2019

Towards Understanding the Spectral Bias of Deep Learning

An intriguing phenomenon observed during training neural networks is the...
research
05/24/2018

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Many tasks in machine learning and signal processing can be solved by mi...
research
03/22/2022

On the (Non-)Robustness of Two-Layer Neural Networks in Different Learning Regimes

Neural networks are known to be highly sensitive to adversarial examples...
research
05/15/2022

Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

The lottery ticket hypothesis (LTH) has attracted attention because it c...
research
08/06/2020

Large-time asymptotics in deep learning

It is by now well-known that practical deep supervised learning may roug...
research
03/14/2023

Practically Solving LPN in High Noise Regimes Faster Using Neural Networks

We conduct a systematic study of solving the learning parity with noise ...

Please sign up or login with your details

Forgot password? Click here to reset