Higher Order Generalization Error for First Order Discretization of Langevin Diffusion
We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an ϵ tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run Ω(ϵ^-1log (ϵ^-1) ) iterations with Ω(ϵ^-1) samples. In this article, we show that with additional smoothness assumptions, even first order methods can achieve arbitrarily runtime complexity. More precisely, for each N>0, we provide a sufficient smoothness condition on the loss function such that a first order discretization can reach ϵ expected generalization error given Ω( ϵ^-1/Nlog (ϵ^-1) ) iterations with Ω(ϵ^-1) samples.
READ FULL TEXT