A Fresh Perspective on DNN Accelerators by Performing Holistic Analysis Across Paradigms

by   Tom Glint, et al.

Traditional computers with von Neumann architecture are unable to meet the latency and scalability challenges of Deep Neural Network (DNN) workloads. Various DNN accelerators based on Conventional compute Hardware Accelerator (CHA), Near-Data-Processing (NDP) and Processing-in-Memory (PIM) paradigms have been proposed to meet these challenges. Our goal in this work is to perform a rigorous comparison among the state-of-the-art accelerators from DNN accelerator paradigms, we have used unique layers from MobileNet, ResNet, BERT, and DLRM of MLPerf Inference benchmark for our analysis. The detailed models are based on hardware-realized state-of-the art designs. We observe that for memory-intensive Fully Connected Layer (FCL) DNNs, NDP based accelerator is 10.6x faster than the state-of-the-art CHA and 39.9x faster than PIM based accelerator for inferencing. For compute-intensive image classification and object detection DNNs, the state-of-the-art CHA is  10x faster than NDP and  2000x faster than the PIM-based accelerator for inferencing. PIM-based accelerators are suitable for DNN applications where energy is a constraint ( 2.7x and  21x lower energy for CNN and FCL applications, respectively, than conventional ASIC systems). Further, we identify architectural changes (such as increasing memory bandwidth, buffer reorganization) that can increase throughput (up to linear increase) and lower energy (up to linear decrease) for ML applications with a detailed sensitivity analysis of relevant components in CHA, NDP and PIM based accelerators.


page 4

page 11

page 12

page 13

page 14

page 18

page 19

page 21


AIDA: Associative DNN Inference Accelerator

We propose AIDA, an inference engine for accelerating fully-connected (F...

Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package

Deep neural network (DNN) models continue to grow in size and complexity...

Temperature-Aware Monolithic 3D DNN Accelerators for Biomedical Applications

In this paper, we focus on temperature-aware Monolithic 3D (Mono3D) deep...

Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept

An accelerator is a specialized integrated circuit designed to perform s...

A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads

We propose a dense tensor accelerator called VectorMesh, a scalable, mem...

Scale-out Systolic Arrays

Multi-pod systolic arrays are emerging as the architecture of choice in ...

CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms

The rise of data-intensive applications exposed the limitations of conve...

Please sign up or login with your details

Forgot password? Click here to reset