Why Quantization Improves Generalization: NTK of Binary Weight Neural Networks

by   Kaiqi Zhang, et al.

Quantized neural networks have drawn a lot of attention as they reduce the space and computational complexity during the inference. Moreover, there has been folklore that quantization acts as an implicit regularizer and thus can improve the generalizability of neural networks, yet no existing work formalizes this interesting folklore. In this paper, we take the binary weights in a neural network as random variables under stochastic rounding, and study the distribution propagation over different layers in the neural network. We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function. We derive the neural tangent kernel (NTK) for this quasi neural network, and show that the eigenvalue of NTK decays at approximately exponential rate, which is comparable to that of Gaussian kernel with randomized scale. This in turn indicates that the Reproducing Kernel Hilbert Space (RKHS) of a binary weight neural network covers a strict subset of functions compared with the one with real value weights. We use experiments to verify that the quasi neural network we proposed can well approximate binary weight neural network. Furthermore, binary weight neural network gives a lower generalization gap compared with real value weight neural network, which is similar to the difference between Gaussian kernel and Laplace kernel.


page 1

page 2

page 3

page 4


Invariance of Weight Distributions in Rectified MLPs

An interesting approach to analyzing and developing tools for neural net...

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

Deep learning algorithms achieve high classification accuracy at the exp...

An Empirical Analysis of the Laplace and Neural Tangent Kernels

The neural tangent kernel is a kernel function defined over the paramete...

Probabilistic Binary Neural Networks

Low bit-width weights and activations are an effective way of combating ...

Compacting Binary Neural Networks by Sparse Kernel Selection

Binary Neural Network (BNN) represents convolution weights with 1-bit va...

Memory Capacity of a Random Neural Network

This paper considers the problem of information capacity of a random neu...

RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks

We present Random Partition Relaxation (RPR), a method for strong quanti...

Please sign up or login with your details

Forgot password? Click here to reset