ADAM-ADMM: A Unified, Systematic Framework of Structured Weight Pruning for DNNs

07/29/2018
by   Tianyun Zhang, et al.
0

Weight pruning methods of deep neural networks (DNNs) have been demonstrated to achieve a good model pruning ratio without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods have been proposed to overcome the limitation of irregular network structure and demonstrated actual GPU acceleration. However, the pruning ratio (degree of sparsity) and GPU acceleration are limited (to less than 50 maintained. In this work, we overcome pruning ratio and GPU acceleration limitations by proposing a unified, systematic framework of structured weight pruning for DNNs, named ADAM-ADMM (Adaptive Moment Estimation-Alternating Direction Method of Multipliers). It is a framework that can be used to induce different types of structured sparsity, such as filter-wise, channel-wise, and shape-wise sparsity, as well non-structured sparsity. The proposed framework incorporates stochastic gradient descent with ADMM, and can be understood as a dynamic regularization method in which the regularization target is analytically updated in each iteration. A significant improvement in weight pruning ratio is achieved without loss of accuracy, along with fast convergence rate. With a small sparsity degree of 33 accuracy enhancement for the AlexNet (CaffeNet) model. This is obtained by mitigation of overfitting. Without loss of accuracy on the AlexNet model, we achieve 2.6 times and 3.65 times average measured speedup on two GPUs, clearly outperforming the prior work. The average speedups reach 2.77 times and 7.5 times when allowing a moderate accuracy loss of 2 compression for convolutional layers is 13.2 times, corresponding to 10.5 times CPU speedup. Our models and codes are released at https://github.com/KaiqiZhang/ADAM-ADMM

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2018

A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers

Weight pruning methods for deep neural networks (DNNs) have been investi...
research
04/12/2020

A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods

To address the large model size and intensive computation requirement of...
research
11/05/2018

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Many model compression techniques of Deep Neural Networks (DNNs) have be...
research
05/31/2021

1×N Block Pattern for Network Sparsity

Though network sparsity emerges as a promising direction to overcome the...
research
03/09/2022

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Weight pruning in deep neural networks (DNNs) can reduce storage and com...
research
01/23/2020

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

Structured weight pruning is a representative model compression techniqu...
research
06/09/2023

How Sparse Can We Prune A Deep Network: A Geometric Viewpoint

Overparameterization constitutes one of the most significant hallmarks o...

Please sign up or login with your details

Forgot password? Click here to reset