A Novel Architecture Slimming Method for Network Pruning and Knowledge Distillation

by   Dongqi Wang, et al.
Zhejiang University

Network pruning and knowledge distillation are two widely-known model compression methods that efficiently reduce computation cost and model size. A common problem in both pruning and distillation is to determine compressed architecture, i.e., the exact number of filters per layer and layer configuration, in order to preserve most of the original model capacity. In spite of the great advances in existing works, the determination of an excellent architecture still requires human interference or tremendous experimentations. In this paper, we propose an architecture slimming method that automates the layer configuration process. We start from the perspective that the capacity of the over-parameterized model can be largely preserved by finding the minimum number of filters preserving the maximum parameter variance per layer, resulting in a thin architecture. We formulate the determination of compressed architecture as a one-step orthogonal linear transformation, and integrate principle component analysis (PCA), where the variances of filters in the first several projections are maximized. We demonstrate the rationality of our analysis and the effectiveness of the proposed method through extensive experiments. In particular, we show that under the same overall compression rate, the compressed architecture determined by our method shows significant performance gain over baselines after pruning and distillation. Surprisingly, we find that the resulting layer-wise compression rates correspond to the layer sensitivities found by existing works through tremendous experimentations.


page 1

page 2

page 3

page 4


A "Network Pruning Network" Approach to Deep Model Compression

We present a filter pruning approach for deep model compression, using a...

Few Shot Network Compression via Cross Distillation

Model compression has been widely adopted to obtain light-weighted deep ...

Can Model Compression Improve NLP Fairness

Model compression techniques are receiving increasing attention; however...

Network compression and faster inference using spatial basis filters

We present an efficient alternative to the convolutional layer through u...

Energy-efficient Knowledge Distillation for Spiking Neural Networks

Spiking neural networks (SNNs) have been gaining interest as energy-effi...

MICIK: MIning Cross-Layer Inherent Similarity Knowledge for Deep Model Compression

State-of-the-art deep model compression methods exploit the low-rank app...

Win the Lottery Ticket via Fourier Analysis: Frequencies Guided Network Pruning

With the remarkable success of deep learning recently, efficient network...

Please sign up or login with your details

Forgot password? Click here to reset