PowerQuant: Automorphism Search for Non-Uniform Quantization

by   Edouard Yvinec, et al.

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such as computer vision. However, due to their high latency, the deployment of DNNs hinges on the development of compression techniques such as quantization which consists in lowering the number of bits used to encode the weights and activations. Growing concerns for privacy and security have motivated the development of data-free techniques, at the expanse of accuracy. In this paper, we identity the uniformity of the quantization operator as a limitation of existing approaches, and propose a data-free non-uniform method. More specifically, we argue that to be readily usable without dedicated hardware and implementation, non-uniform quantization shall not change the nature of the mathematical operations performed by the DNN. This leads to search among the continuous automorphisms of (ℝ_+^*,×), which boils down to the power functions defined by their exponent. To find this parameter, we propose to optimize the reconstruction error of each layer: in particular, we show that this procedure is locally convex and admits a unique solution. At inference time, we show that our approach, dubbed PowerQuant, only require simple modifications in the quantized DNN activation functions. As such, with only negligible overhead, it significantly outperforms existing methods in a variety of configurations.


NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search

Deep neural network (DNN) deployment has been confined to larger hardwar...

REx: Data-Free Residual Quantization Error Expansion

Deep neural networks (DNNs) are nowadays ubiquitous in the computer visi...

Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

Modern DNN workloads increasingly rely on activation functions consistin...

Soft Threshold Weight Reparameterization for Learnable Sparsity

Sparsity in Deep Neural Networks (DNNs) is studied extensively with the ...

SPIQ: Data-Free Per-Channel Static Input Quantization

Computationally expensive neural networks are ubiquitous in computer vis...

DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks

Deep neural networks have achieved state-of-the art performance on vario...

Please sign up or login with your details

Forgot password? Click here to reset