Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

by   Guodong Zhang, et al.

Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings (MH) correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose a differentiable AIS algorithm by abandoning MH steps, which further unlocks mini-batch computation. We provide a detailed convergence analysis for Bayesian linear regression which goes beyond previous analyses by explicitly accounting for non-perfect transitions. Using this analysis, we prove that our algorithm is consistent in the full-batch setting and provide a sublinear convergence rate. However, we show that the algorithm is inconsistent when mini-batch gradients are used due to a fundamental incompatibility between the goals of last-iterate convergence to the posterior and elimination of the pathwise stochastic error. This result is in stark contrast to our experience with stochastic optimization and stochastic gradient Langevin dynamics, where the effects of gradient noise can be washed out by taking more steps of a smaller size. Our negative result relies crucially on our explicit consideration of convergence to the stationary distribution, and it helps explain the difficulty of developing practically effective AIS-like algorithms that exploit mini-batch gradients.


page 1

page 2

page 3

page 4


Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization

In the paper, we study the mini-batch stochastic ADMMs (alternating dire...

Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation

We consider estimating the marginal likelihood in settings with independ...

Convergence of the mini-batch SIHT algorithm

The Iterative Hard Thresholding (IHT) algorithm has been considered exte...

Variance Regularization for Accelerating Stochastic Optimization

While nowadays most gradient-based optimization methods focus on explori...

SGD: General Analysis and Improved Rates

We propose a general yet simple theorem describing the convergence of SG...

Policy Gradients for CVaR-Constrained MDPs

We study a risk-constrained version of the stochastic shortest path (SSP...

Revisiting the Effects of Stochasticity for Hamiltonian Samplers

We revisit the theoretical properties of Hamiltonian stochastic differen...

Please sign up or login with your details

Forgot password? Click here to reset