FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

01/19/2022
by   Shien Zhu, et al.
0

Convolutional Neural Networks (CNNs) demonstrate great performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to the memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of both activations and weights and increase the parallelism of memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency and 1.22X area efficiency compared with State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80

READ FULL TEXT

page 1

page 12

research
03/01/2021

SWIS – Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Quantization is spearheading the increase in performance and efficiency ...
research
08/25/2020

IKW: Inter-Kernel Weights for Power Efficient Edge Computing

Deep Convolutional Neural Networks (CNN) have achieved state-of-the-art ...
research
04/16/2019

Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Efficiency, and Power-Intermittency Resilience

Herein, a bit-wise Convolutional Neural Network (CNN) in-memory accelera...
research
11/03/2020

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary ...
research
07/05/2019

RED: A ReRAM-based Deconvolution Accelerator

Deconvolution has been widespread in neural networks. For example, it is...
research
04/19/2020

Classification using Hyperdimensional Computing: A Review

Hyperdimensional (HD) computing is built upon its unique data type refer...
research
06/30/2022

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

This paper introduces the sparse periodic systolic (SPS) dataflow, which...

Please sign up or login with your details

Forgot password? Click here to reset