On the Optimal Weighted ā_2 Regularization in Overparameterized Linear Regression
We consider the linear model š² = šĪ²_ā + Ļµ with šāā^nĆ p in the overparameterized regime p>n. We estimate Ī²_ā via generalized (weighted) ridge regression: Ī²Ģ_Ī» = (š^Tš + Ī»Ī£_w)^ā š^Tš², where Ī£_w is the weighting matrix. Assuming a random effects model with general data covariance Ī£_x and anisotropic prior on the true coefficients Ī²_ā, i.e., š¼Ī²_āĪ²_ā^T = Ī£_Ī², we provide an exact characterization of the prediction risk š¼(y-š±^TĪ²Ģ_Ī»)^2 in the proportional asymptotic limit p/nāĪ³ā (1,ā). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting Ī»_ opt for the ridge parameter Ī» and confirm the implicit ā_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that Ī»_ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when š and Ī²_ā are non-isotropic. Finally, we determine the optimal Ī£_w for both the ridgeless (Ī»ā 0) and optimally regularized (Ī» = Ī»_ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
READ FULL TEXT