Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

by   H. T. Kung, et al.

We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of the two vectors in the dot product. By exploiting normal-like weight and data distributions typically present in DNNs, TR has a minimal impact on DNN model performance (i.e., accuracy or perplexity). We use TR to facilitate tightly synchronized processor arrays, such as systolic arrays, for efficient parallel processing. We show an FPGA implementation that can use a small number of control bits to switch between conventional quantization and TR-enabled quantization with a negligible delay. To enhance TR efficiency further, we propose HESE encoding (Hybrid Encoding for Signed Expressions) of values, as opposed to classic binary encoding with nonnegative power-of-two terms. We evaluate TR with HESE encoded values on an MLP for MNIST, multiple CNNs for ImageNet, and an LSTM for Wikitext-2, and show significant reductions in inference computations (between 3-10x) compared to conventional quantization for the same level of model performance.


Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

Quantization has been proven to be a vital method for improving the infe...

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

The number of parameters in deep neural networks (DNNs) is rapidly incre...

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantized Neural Networks (QNNs), which use low bitwidth numbers for rep...

Quantized Memory-Augmented Neural Networks

Memory-augmented neural networks (MANNs) refer to a class of neural netw...

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

Quantization has been proven to be an effective method for reducing the ...

Please sign up or login with your details

Forgot password? Click here to reset