Accelerator-Aware Pruning for Convolutional Neural Networks

04/26/2018

∙

Convolutional neural networks have shown tremendous performance in computer vision tasks,but their excessive amount of weights and operations prevent them from being adopted in embedded environments. One of the solutions involves pruning, where some unimportant weights are forced to be zero. Many pruning schemes have been proposed, but have focused mainly on the number of pruned weights. The previous pruning schemes hardly considered ASIC or FPGA accelerator architectures. When the pruned networks are run on the accelerators, the lack of architecture consideration casues some inefficiency problems including internal buffer mis-alignment and load imbalance. This paper proposes a new pruning scheme that reflects accelerator architectures. In the proposed scheme, pruning is performed so that the same number of weights remain for each weight group corresponding to activations fetched simultaneously. In this way, the pruning scheme resolves the inefficiency problems. Even with the constraint, the proposed pruning scheme reached a pruning ratio similar to that of the previous unconstrained pruning schemes not only in AlexNet and VGG16 but also in the state-of-the-art very-deep networks like ResNet. Furthermore, the proposed scheme demonstrated a comparable pruning ratio in slimmed networks that were already pruned channel-wisely. In addition to improving the efficiency of previous sparse accelerators, it will be also shown that the proposed pruning scheme can be used to reduce the logic complexity of sparse accelerators.

READ FULL TEXT

Accelerator-Aware Pruning for Convolutional Neural Networks

Sign in with Google

Consider DeepAI Pro