Overfitting Can Be Harmless for Basis Pursuit: Only to a Degree

by   Peizhong Ju, et al.

Recently, there have been significant interests in studying the generalization power of linear regression models in the overparameterized regime, with the hope that such analysis may provide the first step towards understanding why overparameterized deep neural networks generalize well even when they overfit the training data. Studies on min ℓ_2-norm solutions that overfit the training data have suggested that such solutions exhibit the "double-descent" behavior, i.e., the test error decreases with the number of features p in the overparameterized regime when p is larger than the number of samples n. However, for linear models with i.i.d. Gaussian features, for large p the model errors of such min ℓ_2-norm solutions approach the "null risk," i.e., the error of a trivial estimator that always outputs zero, even when the noise is very low. In contrast, we studied the overfitting solution of min ℓ_1-norm, which is known as Basis Pursuit (BP) in the compressed sensing literature. Under a sparse true linear model with i.i.d. Gaussian features, we show that for a large range of p up to a limit that grows exponentially with n, with high probability the model error of BP is upper bounded by a value that decreases with p and is proportional to the noise level. To the best of our knowledge, this is the first result in the literature showing that, without any explicit regularization in such settings where both p and the dimension of data are much larger than n, the test errors of a practical-to-compute overfitting solution can exhibit double-descent and approach the order of the noise level independently of the null risk. Our upper bound also reveals a descent floor for BP that is proportional to the noise level. Further, this descent floor is independent of n and the null risk, but increases with the sparsity level of the true model.


page 1

page 2

page 3

page 4


Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning

Meta-learning has arisen as a successful method for improving training p...

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

In this paper, we study the generalization performance of min ℓ_2-norm o...

More Data Can Hurt for Linear Regression: Sample-wise Double Descent

In this expository note we describe a surprising phenomenon in overparam...

Conditioning of Random Feature Matrices: Double Descent and Generalization Error

We provide (high probability) bounds on the condition number of random f...

Foolish Crowds Support Benign Overfitting

We prove a lower bound on the excess risk of sparse interpolating proced...

Dimensionality reduction, regularization, and generalization in overparameterized regressions

Overparameterization in deep learning is powerful: Very large models fit...

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

Learning algorithms that divide the data into batches are prevalent in m...

Please sign up or login with your details

Forgot password? Click here to reset