Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces

07/27/2020

∙

We study the behavior of stochastic gradient descent applied to Ax -b _2^2 →min for invertible A ∈ℝ^n × n. We show that there is an explicit constant c_A depending (mildly) on A such that 𝔼 Ax_k+1-b^2_2≤(1 + c_A/A_F^2) A x_k -b ^2_2 - 2/A_F^2A^T (Ax_k - b)^2_2. This is a curious inequality: when applied to a discretization of a partial differential equation like -Δ u = f, the last term measures the regularity of the residual u_k - u in a higher Sobolev space than the remaining terms: if u_k - u has large fourth derivatives (i.e. bi-Laplacian Δ^2), then SGD will dramatically decrease the size of the second derivatives (i.e. Δ) of u_k - u. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This implies a regularization phenomenon: an energy cascade from large singular values to small singular values acts as regularizer.

READ FULL TEXT

Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces

Sign in with Google

Consider DeepAI Pro