Surprises in High-Dimensional Ridgeless Least Squares Interpolation

03/19/2019
by   Trevor Hastie, et al.
0

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ_2-norm interpolation in high-dimensional linear regression. Motivated by the connection with overparametrized neural networks, we consider the case of random features. We study two distinct models for the features' distribution: a linear model in which the feature vectors x_i∈ R^p are obtained by applying a linear transform to vectors of i.i.d. entries, x_i = Σ^1/2z_i (with z_i∈ R^p); a nonlinear model, in which the features are obtained by passing the input through a random one-layer neural network x_i = φ(Wz_i) (with z_i∈ R^d, and φ an activation function acting independently on the coordinates of Wz_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large scale neural networks and kernel machines, including the `double descent' behavior of the generalization error and the potential benefit of overparametrization.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro