Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

by   Zitong Yang, et al.

The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent curve observed in recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.


page 1

page 2

page 3

page 4


Reconciling modern machine learning and the bias-variance trade-off

The question of generalization in machine learning---how algorithms are ...

Superior generalization of smaller models in the presence of significant label noise

The benefits of over-parameterization in achieving superior generalizati...

On the Bias-Variance Characteristics of LIME and SHAP in High Sparsity Movie Recommendation Explanation Tasks

We evaluate two popular local explainability techniques, LIME and SHAP, ...

A Convenient Generalization of Schlick's Bias and Gain Functions

We present a generalization of Schlick's bias and gain functions – simpl...

Architecture Selection via the Trade-off Between Accuracy and Robustness

We provide a general framework for characterizing the trade-off between ...

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

We revisit the bias-variance tradeoff for neural networks in light of mo...

When Does Preconditioning Help or Hurt Generalization?

While second order optimizers such as natural gradient descent (NGD) oft...

Please sign up or login with your details

Forgot password? Click here to reset