Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting

by   Ryo Karakida, et al.

Sequential training from task to task is becoming one of the major objects in deep learning applications such as continual learning and transfer learning. Nevertheless, it remains unclear under what conditions the trained model's performance improves or deteriorates. To deepen our understanding of sequential training, this study provides a theoretical analysis of generalization performance in a solvable case of continual learning. We consider neural networks in the neural tangent kernel (NTK) regime that continually learn target functions from task to task, and investigate the generalization by using an established statistical mechanical analysis of kernel ridge-less regression. We first show characteristic transitions from positive to negative transfer. More similar targets above a specific critical value can achieve positive knowledge transfer for the subsequent task while catastrophic forgetting occurs even with very similar targets. Next, we investigate a variant of continual learning where the model learns the same target function in multiple tasks. Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task. We can guarantee that the generalization error monotonically decreases from task to task for equal sample sizes while unbalanced sample sizes deteriorate the generalization. We respectively refer to these improvement and deterioration as self-knowledge transfer and forgetting, and empirically confirm them in realistic training of deep neural networks as well.


I2I: Initializing Adapters with Improvised Knowledge

Adapters present a promising solution to the catastrophic forgetting pro...

AFEC: Active Forgetting of Negative Transfer in Continual Learning

Continual learning aims to learn a sequence of tasks from dynamic data d...

Continual Learning in the Teacher-Student Setup: Impact of Task Similarity

Continual learning-the ability to learn many tasks in sequence-is critic...

Transfer Learning via Test-Time Neural Networks Aggregation

It has been demonstrated that deep neural networks outperform traditiona...

Theory on Forgetting and Generalization of Continual Learning

Continual learning (CL), which aims to learn a sequence of tasks, has at...

Toward Understanding Catastrophic Forgetting in Continual Learning

We study the relationship between catastrophic forgetting and properties...

EXPANSE: A Deep Continual / Progressive Learning System for Deep Transfer Learning

Deep transfer learning techniques try to tackle the limitations of deep ...

Please sign up or login with your details

Forgot password? Click here to reset