Neural tangent kernel analysis of shallow α-Stable ReLU neural networks

by   Stefano Favaro, et al.

There is a recent literature on large-width properties of Gaussian neural networks (NNs), i.e. NNs whose weights are distributed according to Gaussian distributions. Two popular problems are: i) the study of the large-width behaviour of NNs, which provided a characterization of the infinitely wide limit of a rescaled NN in terms of a Gaussian process; ii) the study of the training dynamics of NNs, which set forth a large-width equivalence between training the rescaled NN and performing a kernel regression with a deterministic kernel referred to as the neural tangent kernel (NTK). In this paper, we consider these problems for α-Stable NNs, which generalize Gaussian NNs by assuming that the NN's weights are distributed as α-Stable distributions with α∈(0,2], i.e. distributions with heavy tails. For shallow α-Stable NNs with a ReLU activation function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an α-Stable process, i.e. a stochastic process with α-Stable finite-dimensional distributions. As a novelty with respect to the Gaussian setting, in the α-Stable setting the choice of the activation function affects the scaling of the NN, namely: to achieve the infinitely wide α-Stable process, the ReLU function requires an additional logarithmic scaling with respect to sub-linear functions. Then, our main contribution is the NTK analysis of shallow α-Stable ReLU-NNs, which leads to a large-width equivalence between training a rescaled NN and performing a kernel regression with an (α/2)-Stable random kernel. The randomness of such a kernel is a novelty with respect to the Gaussian setting, namely: in the α-Stable setting the randomness of the NN at initialization does not vanish in the NTK analysis, thus inducing a distribution for the kernel of the underlying kernel regression.


page 1

page 2

page 3

page 4


Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

There is a growing literature on the study of large-width properties of ...

Deep Stable neural networks: large-width asymptotics and convergence rates

In modern deep learning, there is a recent and growing literature on the...

Large-width functional asymptotics for deep Gaussian neural networks

In this paper, we consider fully connected feed-forward deep neural netw...

Neural Tangent Kernel: A Survey

A seminal work [Jacot et al., 2018] demonstrated that training a neural ...

Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities

There is a growing interest on large-width asymptotic properties of Gaus...

A priori generalization error for two-layer ReLU neural network through minimum norm solution

We focus on estimating a priori generalization error of two-layer ReLU n...

Please sign up or login with your details

Forgot password? Click here to reset