On the Optimal Weighted ā„“_2 Regularization in Overparameterized Linear Regression

06/10/2020
āˆ™
by   Denny Wu, et al.
āˆ™
0
āˆ™

We consider the linear model š² = š—Ī²_ā‹† + Ļµ with š—āˆˆā„^nƗ p in the overparameterized regime p>n. We estimate Ī²_ā‹† via generalized (weighted) ridge regression: Ī²Ģ‚_Ī» = (š—^Tš— + Ī»Ī£_w)^ā€ š—^Tš², where Ī£_w is the weighting matrix. Assuming a random effects model with general data covariance Ī£_x and anisotropic prior on the true coefficients Ī²_ā‹†, i.e., š”¼Ī²_ā‹†Ī²_ā‹†^T = Ī£_Ī², we provide an exact characterization of the prediction risk š”¼(y-š±^TĪ²Ģ‚_Ī»)^2 in the proportional asymptotic limit p/nā†’Ī³āˆˆ (1,āˆž). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting Ī»_ opt for the ridge parameter Ī» and confirm the implicit ā„“_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that Ī»_ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when š— and Ī²_ā‹† are non-isotropic. Finally, we determine the optimal Ī£_w for both the ridgeless (Ī»ā†’ 0) and optimally regularized (Ī» = Ī»_ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset