Learning General World Models in a Handful of Reward-Free Deployments

10/23/2022
by   Yingchen Xu, et al.
0

Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. We then present CASCADE, a novel approach for self-supervised exploration in this new setting. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective. We provide theoretical intuition for CASCADE which we show in a tabular setting improves upon naïve approaches that do not account for population diversity. We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, MiniGrid, Crafter and the DM Control Suite. Code and videos are available at https://ycxuyingchen.github.io/cascade/

READ FULL TEXT

page 2

page 6

page 8

page 20

page 21

research
10/28/2021

URLB: Unsupervised Reinforcement Learning Benchmark

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to s...
research
10/03/2022

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

We study the problem of deployment efficient reinforcement learning (RL)...
research
05/12/2020

Planning to Explore via Self-Supervised World Models

Reinforcement learning allows solving complex tasks, however, the learni...
research
06/16/2020

Task-agnostic Exploration in Reinforcement Learning

Efficient exploration is one of the main challenges in reinforcement lea...
research
02/25/2021

Task-Agnostic Morphology Evolution

Deep reinforcement learning primarily focuses on learning behavior, usua...
research
10/22/2020

Batch Exploration with Examples for Scalable Robotic Reinforcement Learning

Learning from diverse offline datasets is a promising path towards learn...
research
06/10/2019

Self-Supervised Exploration via Disagreement

Efficient exploration is a long-standing problem in sensorimotor learnin...

Please sign up or login with your details

Forgot password? Click here to reset