Gradient Centralization: A New Optimization Technique for Deep Neural Networks

by   Hongwei Yong, et al.

Optimization techniques are of great importance to effectively and efficiently train a deep neural network (DNN). It has been shown that using the first and second order statistics (e.g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance. Different from these existing methods that mostly operate on activations or weights, we present a new optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. Moreover, GC improves the Lipschitzness of the loss function and its gradient so that the training process becomes more efficient and stable. GC is very simple to implement and can be easily embedded into existing gradient based DNN optimizers with only one line of code. It can also be directly used to fine-tune the pre-trained DNNs. Our experiments on various applications, including general image classification, fine-grained image classification, detection and segmentation, demonstrate that GC can consistently improve the performance of DNN learning. The code of GC can be found at


page 1

page 2

page 3

page 4


XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

In this paper, we propose a general deep learning training framework XGr...

TRADI: Tracking deep neural network weight distributions

During training, the weights of a Deep Neural Network (DNN) are optimize...

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Advanced deep neural networks (DNNs), designed by either human or AutoML...

Population-Based Training for Loss Function Optimization

Metalearning of deep neural network (DNN) architectures and hyperparamet...

Efficient Generalization Improvement Guided by Random Weight Perturbation

To fully uncover the great potential of deep neural networks (DNNs), var...

PAL: A fast DNN optimization method based on curvature information

We present a novel optimizer for deep neural networks that combines the ...

Better Training using Weight-Constrained Stochastic Dynamics

We employ constraints to control the parameter space of deep neural netw...

Please sign up or login with your details

Forgot password? Click here to reset