A Variational Analysis of Stochastic Gradient Algorithms

02/08/2016
by   Stephan Mandt, et al.
0

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms; we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models.

READ FULL TEXT

page 3

page 7

research
04/13/2017

Stochastic Gradient Descent as Approximate Bayesian Inference

Stochastic Gradient Descent with a constant learning rate (constant SGD)...
research
11/20/2019

Bayesian interpretation of SGD as Ito process

The current interpretation of stochastic gradient descent (SGD) as a sto...
research
12/24/2022

Visualizing Information Bottleneck through Variational Inference

The Information Bottleneck theory provides a theoretical and computation...
research
11/11/2021

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

Stochastic approximation (SA) and stochastic gradient descent (SGD) algo...
research
07/22/2018

PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques

Stochastic Gradient TreeBoost is often found in many winning solutions i...
research
04/06/2015

Early Stopping is Nonparametric Variational Inference

We show that unconverged stochastic gradient descent can be interpreted ...
research
06/06/2023

Machine learning in and out of equilibrium

The algorithms used to train neural networks, like stochastic gradient d...

Please sign up or login with your details

Forgot password? Click here to reset