Scalable Approximate Inference and Some Applications
Approximate inference in probability models is a fundamental task in machine learning. Approximate inference provides powerful tools to Bayesian reasoning, decision making, and Bayesian deep learning. The main goal is to estimate the expectation of interested functions w.r.t. a target distribution. When it comes to high dimensional probability models and large datasets, efficient approximate inference becomes critically important. In this thesis, we propose a new framework for approximate inference, which combines the advantages of these three frameworks and overcomes their limitations. Our proposed four algorithms are motivated by the recent computational progress of Stein's method. Our proposed algorithms are applied to continuous and discrete distributions under the setting when the gradient information of the target distribution is available or unavailable. Theoretical analysis is provided to prove the convergence of our proposed algorithms. Our adaptive IS algorithm iteratively improves the importance proposal by functionally decreasing the KL divergence between the updated proposal and the target. When the gradient of the target is unavailable, our proposed sampling algorithm leverages the gradient of a surrogate model and corrects induced bias with importance weights, which significantly outperforms other gradient-free sampling algorithms. In addition, our theoretical results enable us to perform the goodness-of-fit test on discrete distributions. At the end of the thesis, we propose an importance-weighted method to efficiently aggregate local models in distributed learning with one-shot communication. Results on simulated and real datasets indicate the statistical efficiency and wide applicability of our algorithm.
READ FULL TEXT