Kraken: An Efficient Engine with a Uniform Dataflow for Deep Neural Networks

12/06/2021
by   G Abarajithan, et al.
0

Deep neural networks (DNNs) have been successfully employed in a multitude of applications with remarkable performance. As such performance is achieved at a significant computational cost, several embedded applications demand fast and efficient hardware accelerators for DNNs. Previously proposed application specific integrated circuit (ASIC) architectures strive to utilize arrays of hundreds of processing elements (PEs) and reduce power-hungry DRAM accesses using multiple dataflows requiring complex PE architectures. These consume significant area and reduce the maximum clock frequency. This paper introduces the Kraken architecture, which optimally processes the convolutional layers, fully-connected layers, and matrix products of any DNN through a hardware-friendly uniform dataflow. This enables maximal data reuse of weights, inputs, and outputs, with a bare-bones PE design and on-the-fly dynamic reconfiguration. Kraken, implemented in 65-nm CMOS technology at 400 MHz, packs 672 PEs in 7.3 mm2, with a peak performance of 537.6 Gops. Kraken processes the convolutional layers of AlexNet, VGG-16, and ResNet-50 at 336.6, 17.5, and 64.2 frames/s, respectively, hence outperforming the state-of-the-art ASIC architectures in terms of overall performance efficiency, DRAM accesses, arithmetic intensity, and throughput, with 5.8x more Gops/mm2 and 1.6x more Gops/W.

READ FULL TEXT

page 1

page 14

research
02/04/2016

EIE: Efficient Inference Engine on Compressed Deep Neural Network

State-of-the-art deep neural networks (DNNs) have hundreds of millions o...
research
03/29/2020

Data-Driven Neuromorphic DRAM-based CNN and RNN Accelerators

The energy consumed by running large deep neural networks (DNNs) on hard...
research
12/16/2018

Digital Neuron: A Hardware Inference Accelerator for Convolutional Deep Neural Networks

We propose a Digital Neuron, a hardware inference accelerator for convol...
research
11/25/2020

Low Latency CMOS Hardware Acceleration for Fully Connected Layers in Deep Neural Networks

We present a novel low latency CMOS hardware accelerator for fully conne...
research
12/23/2020

Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators

The everlasting demand for higher computing power for deep neural networ...
research
12/17/2020

FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons

With the growing demand for deploying deep learning models to the "edge"...
research
05/03/2023

Morphological Classification of Galaxies Using SpinalNet

Deep neural networks (DNNs) with a step-by-step introduction of inputs, ...

Please sign up or login with your details

Forgot password? Click here to reset