Discovering and Achieving Goals via World Models

by   Russell Mendonca, et al.

How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at


page 2

page 5

page 8

page 15

page 16

page 17

page 18

page 19


Asymmetric self-play for automatic goal discovery in robotic manipulation

We train a single, goal-conditioned policy that can solve many robotic m...

Lipschitz-constrained Unsupervised Skill Discovery

We study the problem of unsupervised skill discovery, whose goal is to l...

Visual Reinforcement Learning with Imagined Goals

For an autonomous agent to fulfill a wide range of user-specified goals ...

How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression

Offline goal-conditioned reinforcement learning (GCRL) promises general-...

Zero-Shot Visual Imitation

The current dominant paradigm for imitation learning relies on strong su...

RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models

We describe a robotic learning system for autonomous navigation in diver...

Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation

Video prediction models combined with planning algorithms have shown pro...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset