Towards Theoretically Inspired Neural Initialization Optimization

10/12/2022
by   Yibo Yang, et al.
0

Automated machine learning has been widely explored to reduce human efforts in designing neural architectures and looking for proper hyperparameters. In the domain of neural initialization, however, similar automated techniques have rarely been studied. Most existing initialization methods are handcrafted and highly dependent on specific architectures. In this paper, we propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. Specifically, GradCosine is the cosine similarity of sample-wise gradients with respect to the initialized parameters. By analyzing the sample-wise optimization landscape, we show that both the training and test performance of a network can be improved by maximizing GradCosine under gradient norm constraint. Based on this observation, we further propose the neural initialization optimization (NIO) algorithm. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost compared with the training time. With NIO, we improve the classification performance of a variety of neural architectures on CIFAR-10, CIFAR-100, and ImageNet. Moreover, we find that our method can even help to train large vision Transformer architecture without warmup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2020

Picking Winning Tickets Before Training by Preserving Gradient Flow

Overparameterization has been shown to benefit both the optimization and...
research
03/27/2019

A Sober Look at Neural Network Initializations

Initializing the weights and the biases is a key part of the training pr...
research
03/29/2021

[Reproducibility Report] Rigging the Lottery: Making All Tickets Winners

RigL, a sparse training algorithm, claims to directly train sparse netwo...
research
06/27/2022

AutoInit: Automatic Initialization via Jacobian Tuning

Good initialization is essential for training Deep Neural Networks (DNNs...
research
04/06/2023

Optimizing Neural Networks through Activation Function Discovery and Automatic Weight Initialization

Automated machine learning (AutoML) methods improve upon existing models...
research
12/04/2018

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural ne...
research
06/30/2021

What can linear interpolation of neural network loss landscapes tell us?

Studying neural network loss landscapes provides insights into the natur...

Please sign up or login with your details

Forgot password? Click here to reset