Unsupervised Curricula for Visual Meta-Reinforcement Learning

by   Allan Jabri, et al.

In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast reinforcement learning (RL) strategies that transfer to similar tasks. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can "useful" pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution. We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner's data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks. Repeating this procedure leads to iterative reorganization such that the curriculum adapts as the meta-learner's data distribution shifts. In particular, we show how discriminative clustering for visual representation can support trajectory-level task acquisition and exploration in domains with pixel observations, avoiding pitfalls of alternatives. In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient supervised meta-learning of test task distributions.


page 7

page 9

page 17

page 19

page 20

page 21


Unsupervised Meta-Learning for Reinforcement Learning

Meta-learning is a powerful tool that builds on multi-task learning to l...

Meta Reinforcement Learning with Task Embedding and Shared Policy

Despite significant progress, deep reinforcement learning (RL) suffers f...

Unsupervised Visual Attention and Invariance for Reinforcement Learning

Vision-based reinforcement learning (RL) is successful, but how to gener...

Unsupervised Meta Learning for One Shot Title Compression in Voice Commerce

Product title compression for voice and mobile commerce is a well studie...

Provable Hierarchy-Based Meta-Reinforcement Learning

Hierarchical reinforcement learning (HRL) has seen widespread interest a...

Pre-training as Batch Meta Reinforcement Learning with tiMe

Pre-training is transformative in supervised learning: a large network t...

Discovered Policy Optimisation

Tremendous progress has been made in reinforcement learning (RL) over th...

Please sign up or login with your details

Forgot password? Click here to reset