Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

by   Elaina Chai, et al.

Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the largest eigenvalues that ensures stability. Lastly, we prove that in the first training step, when normalization is needed most, BatchNorm satisfies the same optimization as Normalized Least Mean Square (NLMS), while it continues to approximate this condition in subsequent steps. The analyses provided in this paper lay the groundwork for gaining further insight into the operation of modern neural network structures using adaptive filter theory.


page 1

page 2

page 3

page 4


Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

The success of deep neural networks is in part due to the use of normali...

Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization

Deep convolutional neural networks are known to be unstable during train...

Normalized Convolutional Neural Network

In this paper, we propose Normalized Convolutional Neural Network(NCNN)....

MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

Substantial experiments have validated the success of Batch Normalizatio...

On Implicit Filter Level Sparsity in Convolutional Neural Networks

We investigate filter level sparsity that emerges in convolutional neura...

On the Convergence and Robustness of Batch Normalization

Despite its empirical success, the theoretical underpinnings of the stab...

Stability-Certified Reinforcement Learning via Spectral Normalization

In this article, two types of methods from different perspectives based ...

Please sign up or login with your details

Forgot password? Click here to reset