Simulated Annealing in Early Layers Leads to Better Generalization

by   AmirMohammad Sarfi, et al.

Recently, a number of iterative learning methods have been introduced to improve generalization. These typically rely on training for longer periods of time in exchange for improved generalization. LLF (later-layer-forgetting) is a state-of-the-art method in this category. It strengthens learning in early layers by periodically re-initializing the last few layers of the network. Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers. Essentially, later layers go through the normal gradient descent process, while the early layers go through short stints of gradient ascent followed by gradient descent. Extensive experiments on the popular Tiny-ImageNet dataset benchmark and a series of transfer learning and few-shot learning tasks show that we outperform LLF by a significant margin. We further show that, compared to normal training, LLF features, although improving on the target task, degrade the transfer learning performance across all datasets we explored. In comparison, our method outperforms LLF across the same target datasets by a large margin. We also show that the prediction depth of our method is significantly lower than that of LLF and normal training, indicating on average better prediction performance.


page 1

page 2

page 3

page 4


SA-GD: Improved Gradient Descent Learning Strategy with Simulated Annealing

Gradient descent algorithm is the most utilized method when optimizing m...

Overfreezing Meets Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

We study the generalization behavior of transfer learning of deep neural...

Improving Meta-Learning Generalization with Activation-Based Early-Stopping

Meta-Learning algorithms for few-shot learning aim to train neural netwo...

Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Despite their massive success, training successful deep neural networks ...

Neural Networks can Learn Representations with Gradient Descent

Significant theoretical work has established that in specific regimes, n...

An analytic theory of generalization dynamics and transfer learning in deep linear networks

Much attention has been devoted recently to the generalization puzzle in...

Deep Learning Through the Lens of Example Difficulty

Existing work on understanding deep learning often employs measures that...

Please sign up or login with your details

Forgot password? Click here to reset