The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models

02/18/2023
by   Danil Akhtiamov, et al.
0

Despite being highly over-parametrized, and having the ability to fully interpolate the training data, deep networks are known to generalize well to unseen data. It is now understood that part of the reason for this is that the training algorithms used have certain implicit regularization properties that ensure interpolating solutions with "good" properties are found. This is best understood in linear over-parametrized models where it has been shown that the celebrated stochastic gradient descent (SGD) algorithm finds an interpolating solution that is closest in Euclidean distance to the initial weight vector. Different regularizers, replacing Euclidean distance with Bregman divergence, can be obtained if we replace SGD with stochastic mirror descent (SMD). Empirical observations have shown that in the deep network setting, SMD achieves a generalization performance that is different from that of SGD (and which depends on the choice of SMD's potential function. In an attempt to begin to understand this behavior, we obtain the generalization error of SMD for over-parametrized linear models for a binary classification problem where the two classes are drawn from a Gaussian mixture model. We present simulation results that validate the theory and, in particular, introduce two data models, one for which SMD with an ℓ_2 regularizer (i.e., SGD) outperforms SMD with an ℓ_1 regularizer, and one for which the reverse happens.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization

Stochastic descent methods (of the gradient and mirror varieties) have b...
research
02/22/2022

Explicit Regularization via Regularizer Mirror Descent

Despite perfectly interpolating the training data, deep neural networks ...
research
04/15/2019

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

The goal of this paper is to study why stochastic gradient descent (SGD)...
research
06/10/2019

Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization

Most modern learning problems are highly overparameterized, meaning that...
research
11/01/2018

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications

Deep neural networks with remarkably strong generalization performances ...
research
04/03/2019

A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

Stochastic mirror descent (SMD) is a fairly new family of algorithms tha...
research
06/03/2022

Generalization for multiclass classification with overparameterized linear models

Via an overparameterized linear model with Gaussian features, we provide...

Please sign up or login with your details

Forgot password? Click here to reset