SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

by   fcq, et al.

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine DNN models and four datasets; and 2) on the hardware level, the proposed SmartExchange based accelerator can improve the energy efficiency by up to 6.7× and the speedup by up to 19.2× over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.


page 1

page 5

page 11

page 12


SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training

The record-breaking performance of deep neural networks (DNNs) comes wit...

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

The success of DNN pruning has led to the development of energy-efficien...

CREW: Computation Reuse and Efficient Weight Storage for Hardware-accelerated MLPs and RNNs

Deep Neural Networks (DNNs) have achieved tremendous success for cogniti...

A Power and Area Efficient Lepton Hardware Encoder with Hash-based Memory Optimization

Although it has been surpassed by many subsequent coding standards, JPEG...

A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

The computing wall and data movement challenges of deep neural networks ...

MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression

Deep neural networks (DNNs) have become the state-of-the-art technique f...

An Algorithm-Hardware Co-design Framework to Overcome Imperfections of Mixed-signal DNN Accelerators

In recent years, processing in memory (PIM) based mixedsignal designs ha...

Please sign up or login with your details

Forgot password? Click here to reset