Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

by   Anup Sarma, et al.

Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such as image classification and object detection. However, training these models involving large parameters is both time-consuming and energy-hogging. In this regard, several prior works have advocated for sparsity to speed up the of DL training and more so, the inference phase. This work begins with the observation that during training, sparsity in the forward and backward passes are correlated. In that context, we investigate two types of sparsity (input and output type) inherent in gradient descent-based optimization algorithms and propose a hardware micro-architecture to leverage the same. Our experimental results use five state-of-the-art CNN models on the Imagenet dataset, and show back propagation speedups in the range of 1.69× to 5.43×, compared to the dense baseline execution. By exploiting sparsity in both the forward and backward passes, speedup improvements range from 1.68× to 3.30× over the sparsity-agnostic baseline execution. Our work also achieves significant reduction in training iteration time over several previously proposed dense as well as sparse accelerator based platforms, in addition to achieving order of magnitude energy efficiency improvements over GPU based execution.


page 5

page 6

page 7


PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices

3D object detection using point cloud (PC) data is vital for autonomous ...

SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

Contemporary Deep Neural Network (DNN) contains millions of synaptic con...

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Training Convolutional Neural Networks (CNNs) usually requires a large n...

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

Deep Learning (DL) acceleration support in CPUs has recently gained a lo...

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

Exploiting sparsity enables hardware systems to run neural networks fast...

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Currently, Machine Learning (ML) is becoming ubiquitous in everyday life...

SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors

Our community has greatly improved the efficiency of deep learning appli...

Please sign up or login with your details

Forgot password? Click here to reset