High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks

by   hossam-amer, et al.

Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an activation map by redundantly storing the map's elements included at horizontal and/or vertical kernel overlappings without considering the sparsity of the map. Using the sparsity of the map, this paper proposes two new convolution algorithms dubbed Compressed Pattern Overlap (CPO) and Compressed Pattern Sets (CPS) that simultaneously decrease the memory footprint and increase the inference speed while preserving the accuracy. CPO recognizes non-zero elements (NZEs) at horizontal and vertical overlappings in the activation maps. CPS further improves the memory savings of CPO by compressing the index positions of neighboring NZEs. In both algorithms, channels/regions of the activation maps with all zeros are skipped. Then, CPO/CPS performs convolution via Sparse Matrix-Vector Multiplication (SpMv) done on their sparse representations. Experimental results conducted on CPUs show that average per-layer time savings reach up to 63 with respect to im2col. In some layers, our average per layer CPO/CPS time savings are better by 28 implementation of MEC. For a given CNN's inference, we offline select for each convolution layer the best convolutional algorithm in terms of time between either CPO or CPS and im2col. Our algorithms were selected up to 56 non-pointwise convolutional layers. Our offline selections yield CNN inference time savings up to 9


Reliable Identification of Redundant Kernels for Convolutional Neural Network Compression

To compress deep convolutional neural networks (CNNs) with large memory ...

An Empirical Study on Position of the Batch Normalization Layer in Convolutional Neural Networks

In this paper, we have studied how the training of the convolutional neu...

SBNet: Sparse Blocks Network for Fast Inference

Conventional deep convolutional neural networks (CNNs) apply convolution...

Memory Bounded Deep Convolutional Networks

In this work, we investigate the use of sparsity-inducing regularizers d...

Parsimonious Inference on Convolutional Neural Networks: Learning and applying on-line kernel activation rules

A new, radical CNN design approach is presented in this paper, consideri...

The Power of Sparsity in Convolutional Neural Networks

Deep convolutional networks are well-known for their high computational ...

Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks

While CNNs naturally lend themselves to densely sampled data, and sophis...

Please sign up or login with your details

Forgot password? Click here to reset