Training Deep Neural Networks Without Batch Normalization

by   Divya Gaur, et al.

Training neural networks is an optimization problem, and finding a decent set of parameters through gradient descent can be a difficult task. A host of techniques has been developed to aid this process before and during the training phase. One of the most important and widely used class of method is normalization. It is generally favorable for neurons to receive inputs that are distributed with zero mean and unit variance, so we use statistics about dataset to normalize them before the first layer. However, this property cannot be guaranteed for the intermediate activations inside the network. A widely used method to enforce this property inside the network is batch normalization. It was developed to combat covariate shift inside networks. Empirically it is known to work, but there is a lack of theoretical understanding about its effectiveness and potential drawbacks it might have when used in practice. This work studies batch normalization in detail, while comparing it with other methods such as weight normalization, gradient clipping and dropout. The main purpose of this work is to determine if it is possible to train networks effectively when batch normalization is removed through adaption of the training process.


page 1

page 2

page 3

page 4


Weight and Gradient Centralization in Deep Neural Networks

Batch normalization is currently the most widely used variant of interna...

Batchless Normalization: How to Normalize Activations with just one Instance in Memory

In training neural networks, batch normalization has many benefits, not ...

Revisit Batch Normalization: New Understanding from an Optimization View and a Refinement via Composition Optimization

Batch Normalization (BN) has been used extensively in deep learning to a...

Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Batch normalization is widely used in deep learning to normalize interme...

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

In this work, we propose a novel technique to boost training efficiency ...

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

We present weight normalization: a reparameterization of the weight vect...

Correct Normalization Matters: Understanding the Effect of Normalization On Deep Neural Network Models For Click-Through Rate Prediction

Normalization has become one of the most fundamental components in many ...

Please sign up or login with your details

Forgot password? Click here to reset