A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning

by   Zhi Wang, et al.

While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In the paper, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the non-stationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian non-parametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture, thus the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.


page 2

page 3

page 4

page 5

page 6

page 7

page 9

page 11


Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

A central capability of a long-lived reinforcement learning (RL) agent i...

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

A key challenge of continual reinforcement learning (CRL) in dynamic env...

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

Meta-reinforcement learning enables artificial agents to learn from rela...

Dirichlet process mixture models for non-stationary data streams

In recent years, we have seen a handful of work on inference algorithms ...

Locally Constrained Policy Optimization for Online Reinforcement Learning in Non-Stationary Input-Driven Environments

We study online Reinforcement Learning (RL) in non-stationary input-driv...

Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process

Recent research efforts in lifelong learning propose to grow a mixture o...

ABC Reinforcement Learning

This paper introduces a simple, general framework for likelihood-free Ba...

Please sign up or login with your details

Forgot password? Click here to reset