Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces

07/27/2020
by   Stefan Steinerberger, et al.
0

We study the behavior of stochastic gradient descent applied to Ax -b _2^2 →min for invertible A ∈ℝ^n × n. We show that there is an explicit constant c_A depending (mildly) on A such that 𝔼  Ax_k+1-b^2_2≤(1 + c_A/A_F^2) A x_k -b ^2_2 - 2/A_F^2A^T (Ax_k - b)^2_2. This is a curious inequality: when applied to a discretization of a partial differential equation like -Δ u = f, the last term measures the regularity of the residual u_k - u in a higher Sobolev space than the remaining terms: if u_k - u has large fourth derivatives (i.e. bi-Laplacian Δ^2), then SGD will dramatically decrease the size of the second derivatives (i.e. Δ) of u_k - u. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This implies a regularization phenomenon: an energy cascade from large singular values to small singular values acts as regularizer.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro