PENNI: Pruned Kernel Sharing for Efficient CNN Inference

05/14/2020
by   Shiyu Li, et al.
0

Although state-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks, their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices. Previous works on CNN acceleration utilize low-rank approximation of the original convolution layers to reduce computation cost. However, these methods are very difficult to conduct upon sparse models, which limits execution speedup since redundancies within the CNN model are not fully exploited. We argue that kernel granularity decomposition can be conducted with low-rank assumption while exploiting the redundancy within the remaining compact coefficients. Based on this observation, we propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously by (1) implementing kernel sharing in convolution layers via a small number of basis kernels and (2) alternately adjusting bases and coefficients with sparse constraints. Experiments show that we can prune 97 achieve 44 inference latency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Learning Low-Rank Approximation for CNNs

Low-rank approximation is an effective model compression technique to no...
research
09/04/2020

ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

Convolutional Neural Networks (CNNs) are known to be significantly over-...
research
11/07/2022

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition

Tucker decomposition is one of the SOTA CNN model compression techniques...
research
11/19/2015

Convolutional neural networks with low-rank regularization

Large CNNs have delivered impressive performance in various computer vis...
research
07/19/2021

Latency-Memory Optimized Splitting of Convolution Neural Networks for Resource Constrained Edge Devices

With the increasing reliance of users on smart devices, bringing essenti...
research
09/25/2019

FALCON: Fast and Lightweight Convolution for Compressing and Accelerating CNN

How can we efficiently compress Convolutional Neural Networks (CNN) whil...
research
10/09/2018

Penetrating the Fog: the Path to Efficient CNN Models

With the increasing demand to deploy convolutional neural networks (CNNs...

Please sign up or login with your details

Forgot password? Click here to reset