Natural continual learning: success is a journey, not (just) a destination

by   Ta-Chu Kao, et al.

Biological agents are known to learn many different tasks over the course of their lives, and to be able to revisit previous tasks and behaviors with little to no loss in performance. In contrast, artificial agents are prone to 'catastrophic forgetting' whereby performance on previous tasks deteriorates rapidly as new ones are acquired. This shortcoming has recently been addressed using methods that encourage parameters to stay close to those used for previous tasks. This can be done by (i) using specific parameter regularizers that map out suitable destinations in parameter space, or (ii) guiding the optimization journey by projecting gradients into subspaces that do not interfere with previous tasks. However, parameter regularization has been shown to be relatively ineffective in recurrent neural networks (RNNs), a setting relevant to the study of neural dynamics supporting biological continual learning. Similarly, projection based methods can reach capacity and fail to learn any further as the number of tasks increases. To address these limitations, we propose Natural Continual Learning (NCL), a new method that unifies weight regularization and projected gradient descent. NCL uses Bayesian weight regularization to encourage good performance on all tasks at convergence and combines this with gradient projections designed to prevent catastrophic forgetting during optimization. NCL formalizes gradient projection as a trust region algorithm based on the Fisher information metric, and achieves scalability via a novel Kronecker-factored approximation strategy. Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs. The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.


Weight Friction: A Simple Method to Overcome Catastrophic Forgetting and Enable Continual Learning

In recent years, deep neural networks have found success in replicating ...

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

Continual learning aims to enable machine learning models to learn a gen...

Orthogonal Gradient Descent for Continual Learning

Neural networks are achieving state of the art and sometimes super-human...

Adaptive Group Sparse Regularization for Continual Learning

We propose a novel regularization-based continual learning method, dubbe...

Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning

The backpropagation networks are notably susceptible to catastrophic for...

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Research on continual learning has led to a variety of approaches to mit...

Rethinking Quadratic Regularizers: Explicit Movement Regularization for Continual Learning

Quadratic regularizers are often used for mitigating catastrophic forget...

Please sign up or login with your details

Forgot password? Click here to reset