Continual learning using hash-routed convolutional neural networks

10/09/2020

∙

Continual learning could shift the machine learning paradigm from data centric to model centric. A continual learning model needs to scale efficiently to handle semantically different datasets, while avoiding unnecessary growth. We introduce hash-routed convolutional neural networks: a group of convolutional units where data flows dynamically. Feature maps are compared using feature hashing and similar data is routed to the same units. A hash-routed network provides excellent plasticity thanks to its routed nature, while generating stable features through the use of orthogonal feature hashing. Each unit evolves separately and new units can be added (to be used only when necessary). Hash-routed networks achieve excellent performance across a variety of typical continual learning benchmarks without storing raw data and train using only gradient descent. Besides providing a continual learning framework for supervised tasks with encouraging results, our model can be used for unsupervised or reinforcement learning.

READ FULL TEXT

Continual learning using hash-routed convolutional neural networks

Adaptive Group Sparse Regularization for Continual Learning

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

Towards Robust Feature Learning with t-vFM Similarity for Continual Learning

Nonparametric Bayesian Structure Adaptation for Continual Learning

Debugging using Orthogonal Gradient Descent

Parallel Weight Consolidation: A Brain Segmentation Case Study

Convolution with even-sized kernels and symmetric padding

Continual learning using hash-routed convolutional neural networks

Related Research

Adaptive Group Sparse Regularization for Continual Learning

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

Towards Robust Feature Learning with t-vFM Similarity for Continual Learning

Nonparametric Bayesian Structure Adaptation for Continual Learning

Debugging using Orthogonal Gradient Descent

Parallel Weight Consolidation: A Brain Segmentation Case Study

Convolution with even-sized kernels and symmetric padding