The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes

12/23/2022
by   Alexander Atanasov, et al.
0

For small training set sizes P, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime. However, after a critical sample size P^*, we empirically find the finite-width network generalization becomes worse than that of the infinite width network. In this work, we empirically study the transition from infinite-width behavior to this variance limited regime as a function of sample size P and network width N. We find that finite-size effects can become relevant for very small dataset sizes on the order of P^* ∼√(N) for polynomial regression with ReLU networks. We discuss the source of these effects using an argument based on the variance of the NN's final neural tangent kernel (NTK). This transition can be pushed to larger P by enhancing feature learning or by ensemble averaging the networks. We find that the learning curve for regression with the final NTK is an accurate approximation of the NN learning curve. Using this, we provide a toy model which also exhibits P^* ∼√(N) scaling and has P-dependent benefits from feature learning.

READ FULL TEXT

page 5

page 15

page 27

page 29

page 31

research
04/06/2023

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

We analyze the dynamics of finite width effects in wide but finite featu...
research
06/23/2020

Statistical Mechanics of Generalization in Kernel Regression

Generalization beyond a training dataset is a main goal of machine learn...
research
12/06/2019

A priori generalization error for two-layer ReLU neural network through minimum norm solution

We focus on estimating a priori generalization error of two-layer ReLU n...
research
07/01/2021

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

We analyze the learning dynamics of infinitely wide neural networks with...
research
02/12/2021

Explaining Neural Scaling Laws

The test loss of well-trained neural networks often follows precise powe...
research
09/07/2023

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

This work investigates the nuanced algorithm design choices for deep lea...

Please sign up or login with your details

Forgot password? Click here to reset