On the Ideal Number of Groups for Isometric Gradient Propagation

02/07/2023
by   Bum Jun Kim, et al.
0

Recently, various normalization layers have been proposed to stabilize the training of deep neural networks. Among them, group normalization is a generalization of layer normalization and instance normalization by allowing a degree of freedom in the number of groups it uses. However, to determine the optimal number of groups, trial-and-error-based hyperparameter tuning is required, and such experiments are time-consuming. In this study, we discuss a reasonable method for setting the number of groups. First, we find that the number of groups influences the gradient behavior of the group normalization layer. Based on this observation, we derive the ideal number of groups, which calibrates the gradient scale to facilitate gradient descent optimization. Our proposed number of groups is theoretically grounded, architecture-aware, and can provide a proper value in a layer-wise manner for all layers. The proposed method exhibited improved performance over existing methods in numerous neural network architectures, tasks, and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2019

U-Net Training with Instance-Layer Normalization

Normalization layers are essential in a Deep Convolutional Neural Networ...
research
06/17/2021

Backward Gradient Normalization in Deep Neural Networks

We introduce a new technique for gradient normalization during neural ne...
research
02/24/2020

Breaking Batch Normalization for better explainability of Deep Neural Networks through Layer-wise Relevance Propagation

The lack of transparency of neural networks stays a major break for thei...
research
07/22/2019

Switchable Normalization for Learning-to-Normalize Deep Representation

We address a learning-to-normalize problem by proposing Switchable Norma...
research
06/28/2018

Differentiable Learning-to-Normalize via Switchable Normalization

We address a learning-to-normalize problem by proposing Switchable Norma...
research
07/22/2019

Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients

Normalization layers are widely used in deep neural networks to stabiliz...
research
04/26/2016

Scale Normalization

One of the difficulties of training deep neural networks is caused by im...

Please sign up or login with your details

Forgot password? Click here to reset