CascadeCNN: Pushing the performance limits of quantisation

05/22/2018
by   Alexandros Kouris, et al.
0

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off. Without the need for retraining, a two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and a high-precision unit. A confidence evaluation unit is employed between them to identify misclassified cases at run time and forward them to the high-precision unit or terminate computation. Experiments demonstrate that CascadeCNN achieves a performance boost of up to 55 the same resource budget and accuracy.

READ FULL TEXT

page 1

page 2

page 3

research
07/13/2018

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

This work presents CascadeCNN, an automated toolflow that pushes the qua...
research
09/01/2016

Neural Network Architecture Optimization through Submodularity and Supermodularity

Deep learning models' architectures, including depth and width, are key ...
research
12/28/2021

HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation

Quantization for Convolutional Neural Network (CNN) has shown significan...
research
10/22/2021

Experience with PCIe streaming on FPGA for high throughput ML inferencing

Achieving maximum possible rate of inferencing with minimum hardware res...
research
07/22/2022

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation

Quantization for CNN has shown significant progress with the intention o...
research
11/11/2019

DRAB-LOCUS: An Area-Efficient AES Architecture for Hardware Accelerator Co-Location on FPGAs

Advanced Encryption Standard (AES) implementations on Field Programmable...

Please sign up or login with your details

Forgot password? Click here to reset