SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

by   Cong Guo, et al.

Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive and confidential scenarios. However, current DFQ solutions degrade accuracy, need synthetic data to calibrate networks, and are time-consuming and costly. This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements. With the theoretical analysis of the second-order information of DNN task loss, we decompose and approximate the Hessian-based optimization objective into three diagonal sub-items, which have different areas corresponding to three dimensions of weight tensor: element-wise, kernel-wise, and output channel-wise. Then, we progressively compose sub-items and propose a novel data-free optimization objective in the discrete domain, minimizing Constrained Absolute Sum of Error (or CASE in short), which surprisingly does not need any dataset and is even not aware of network architecture. We also design an efficient algorithm without back-propagation to further reduce the computation complexity of the objective solver. Finally, without fine-tuning and synthetic datasets, SQuant accelerates the data-free quantization process to a sub-second level with >30 improvement over the existing data-free post-training quantization works, with the evaluated models under 4-bit quantization. We have open-sourced the SQuant framework at https://github.com/clevercool/SQuant.


page 1

page 2

page 3

page 4


QuIP: 2-Bit Quantization of Large Language Models With Guarantees

This work studies post-training parameter quantization in large language...

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

Neural network quantization is a very promising solution in the field of...

EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian

Quantization of deep neural networks (DNN) has become a key element in t...

ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization

Network quantization has emerged as a promising method for model compres...

Channel-wise Hessian Aware trace-Weighted Quantization of Neural Networks

Second-order information has proven to be very effective in determining ...

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Quantized neural networks typically require smaller memory footprints an...

Deep Conditional Measure Quantization

The quantization of a (probability) measure is replacing it by a sum of ...

Please sign up or login with your details

Forgot password? Click here to reset