Double Descent Demystified: Identifying, Interpreting Ablating the Sources of a Deep Learning Puzzle

by   Rylan Schaeffer, et al.

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.


page 1

page 2

page 3

page 4


More Data Can Hurt for Linear Regression: Sample-wise Double Descent

In this expository note we describe a surprising phenomenon in overparam...

Analysis of Interpolating Regression Models and the Double Descent Phenomenon

A regression model with more parameters than data points in the training...

Does Double Descent Occur in Self-Supervised Learning?

Most investigations into double descent have focused on supervised model...

A Geometric Look at Double Descent Risk: Volumes, Singularities, and Distinguishabilities

The appearance of the double-descent risk phenomenon has received growin...

Dropout Drops Double Descent

In this paper, we find and analyze that we can easily drop the double de...

Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit th...

Do Deeper Convolutional Networks Perform Better?

Over-parameterization is a recent topic of much interest in the machine ...

Please sign up or login with your details

Forgot password? Click here to reset