CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

07/13/2018

∙

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference. A two-stage architecture tailored for any given CNN-FPGA pair is generated, consisting of a low- and high-precision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively low-precision unit and forward them to the high-precision unit for re-processing. Experiments demonstrate that the proposed toolflow can achieve a performance boost up to 55 design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data.

READ FULL TEXT

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

Sign in with Google

Consider DeepAI Pro