Correlation Functions in Random Fully Connected Neural Networks at Finite Width

04/03/2022
by   Boris Hanin, et al.
0

This article considers fully connected neural networks with Gaussian random weights and biases and L hidden layers, each of width proportional to a large parameter n. For polynomially bounded non-linearities we give sharp estimates in powers of 1/n for the joint correlation functions of the network output and its derivatives. Moreover, we obtain exact layerwise recursions for these correlation functions and solve a number of special cases for classes of non-linearities including ReLU and tanh. We find in both cases that the depth-to-width ratio L/n plays the role of an effective network depth, controlling both the scale of fluctuations at individual neurons and the size of inter-neuron correlations. We use this to study a somewhat simplified version of the so-called exploding and vanishing gradient problem, proving that this particular variant occurs if and only if L/n is large. Several of the key ideas in this article were first developed at a physics level of rigor in a recent monograph with Roberts and Yaida.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro