Bayesian sparsification for deep neural networks with Bayesian model reduction

by   Dimitrije Marković, et al.

Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art – in Bayesian sparsification of deep neural networks – combines structural shrinkage priors on model weights with an approximate inference scheme based on black-box stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the computational efficiency and the pruning rate of the BMR method relative to the established stochastic variational inference (SVI) scheme, when applied to the full hierarchical generative model. We illustrate the potential of BMR to prune model parameters across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.


A comprehensive study of spike and slab shrinkage priors for structurally sparse Bayesian neural networks

Network complexity and computational efficiency have become increasingly...

Vprop: Variational Inference using RMSprop

Many computationally-efficient methods for Bayesian deep learning rely o...

Bayesian Models of Data Streams with Hierarchical Power Priors

Making inferences from data streams is a pervasive problem in many moder...

Pathologies in priors and inference for Bayesian transformers

In recent years, the transformer has established itself as a workhorse i...

Bayesian Compression for Deep Learning

Compression and computational efficiency in deep learning have become a ...

Design of Communication Systems using Deep Learning: A Variational Inference Perspective

An approach to design end to end communication system using deep learnin...

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

Bayesian model reduction provides an efficient approach for comparing th...

Please sign up or login with your details

Forgot password? Click here to reset