Overfitting Can Be Harmless for Basis Pursuit: Only to a Degree
Recently, there have been significant interests in studying the generalization power of linear regression models in the overparameterized regime, with the hope that such analysis may provide the first step towards understanding why overparameterized deep neural networks generalize well even when they overfit the training data. Studies on min ℓ_2-norm solutions that overfit the training data have suggested that such solutions exhibit the "double-descent" behavior, i.e., the test error decreases with the number of features p in the overparameterized regime when p is larger than the number of samples n. However, for linear models with i.i.d. Gaussian features, for large p the model errors of such min ℓ_2-norm solutions approach the "null risk," i.e., the error of a trivial estimator that always outputs zero, even when the noise is very low. In contrast, we studied the overfitting solution of min ℓ_1-norm, which is known as Basis Pursuit (BP) in the compressed sensing literature. Under a sparse true linear model with i.i.d. Gaussian features, we show that for a large range of p up to a limit that grows exponentially with n, with high probability the model error of BP is upper bounded by a value that decreases with p and is proportional to the noise level. To the best of our knowledge, this is the first result in the literature showing that, without any explicit regularization in such settings where both p and the dimension of data are much larger than n, the test errors of a practical-to-compute overfitting solution can exhibit double-descent and approach the order of the noise level independently of the null risk. Our upper bound also reveals a descent floor for BP that is proportional to the noise level. Further, this descent floor is independent of n and the null risk, but increases with the sparsity level of the true model.
READ FULL TEXT