Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize

07/29/2020
by   Jun Shu, et al.
3

The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) for deep neural networks (DNNs) training and generalization. However, current hand-designed LR schedules need to manually pre-specify schedule as well as its extra hyper-parameters, which limits its ability to adapt non-convex optimization problems due to the significant variation of training dynamic. To address this issue, we propose a model capable of adaptively learning LR schedule from data. We specifically design a meta-learner with explicit mapping formulation to parameterize LR schedules, which can adjust LR adaptively to comply with current training dynamic by leveraging the information from past training histories. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules compared with baseline methods. Moreover, we transfer the learned LR schedule to other various tasks, like different training batch sizes, epochs, datasets, network architectures, especially large scale ImageNet dataset, showing its stronger generalization capability than related methods. Finally, guided by a small set of clean validation set, we show our method can achieve better generalization error when training data is biased with corrupted noise than baseline methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2019

Learning an Adaptive Learning Rate Schedule

The learning rate is one of the most important hyper-parameters for mode...
research
02/11/2022

CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning

Modern deep neural networks can easily overfit to biased training data c...
research
11/19/2018

Deep Frank-Wolfe For Neural Network Optimization

Learning a deep neural network requires solving a challenging optimizati...
research
08/08/2020

Meta Feature Modulator for Long-tailed Recognition

Deep neural networks often degrade significantly when training data suff...
research
02/24/2018

A Walk with SGD

Exploring why stochastic gradient descent (SGD) based optimization metho...
research
02/20/2019

Tug the Student to Learn Right: Progressive Gradient Correcting by Meta-learner on Corrupted Labels

While deep networks have strong fitting capability to complex input patt...
research
02/20/2019

Push the Student to Learn Right: Progressive Gradient Correcting by Meta-learner on Corrupted Labels

While deep networks have strong fitting capability to complex input patt...

Please sign up or login with your details

Forgot password? Click here to reset