Learning Sparse Neural Networks through L_0 Regularization

12/04/2017
by   Christos Louizos, et al.
0

We propose a practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is interesting since (1) it can greatly speed up training and inference, and (2) it can improve generalization. AIC and BIC, well-known model selection criteria, are special cases of L_0 regularization. However, since the L_0 norm of weights is non-differentiable, we cannot incorporate it directly as a regularization term in the objective function. We propose a solution through the inclusion of a collection of non-negative stochastic gates, which collectively determine which weights to set to zero. We show that, somewhat surprisingly, for certain distributions over the gates, the expected L_0 norm of the resulting gated weights is differentiable with respect to the distribution parameters. We further propose the hard concrete distribution for the gates, which is obtained by "stretching" a binary concrete distribution and then transforming its samples with a hard-sigmoid. The parameters of the distribution over the gates can then be jointly optimized with the original network parameters. As a result our method allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way. We perform various experiments to demonstrate the effectiveness of the resulting approach and regularizer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

DiffPrune: Neural Network Pruning with Deterministic Approximate Binary Gates and L_0 Regularization

Modern neural network architectures typically have many millions of para...
research
10/08/2019

Differentiable Sparsification for Deep Neural Networks

A deep neural network has relieved the burden of feature engineering by ...
research
06/23/2020

Embedding Differentiable Sparsity into Deep Neural Network

In this paper, we propose embedding sparsity into the structure of deep ...
research
04/09/2019

L_0-ARM: Network Sparsification via Stochastic Binary Optimization

We consider network sparsification as an L_0-norm regularized binary opt...
research
02/02/2023

Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

The top-k operator returns a k-sparse vector, where the non-zero values ...
research
08/23/2023

A multiobjective continuation method to compute the regularization path of deep neural networks

Sparsity is a highly desired feature in deep neural networks (DNNs) sinc...
research
09/21/2023

Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

Stochastic Gradient Descent (SGD), a widely used optimization algorithm ...

Please sign up or login with your details

Forgot password? Click here to reset