Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

06/23/2022
by   Lina Mezghani, et al.
0

Learning a diverse set of skills by interacting with an environment without any external supervision is an important challenge. In particular, obtaining a goal-conditioned agent that can reach any given state is useful in many applications. We propose a novel method for training such a goal-conditioned agent without any external rewards or any domain knowledge. We use random walk to train a reachability network that predicts the similarity between two states. This reachability network is then used in building goal memory containing past observations that are diverse and well-balanced. Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory. All the components are kept updated throughout training as the agent discovers and learns new goals. We apply our method to a continuous control navigation and robotic manipulation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 9

research
11/28/2018

Unsupervised Control Through Non-Parametric Discriminative Rewards

Learning to control an environment without hand-crafted rewards or exper...
research
04/10/2020

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

Learning to navigate in a realistic setting where an agent must rely sol...
research
11/07/2022

C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Given a particular embodiment, we propose a novel method (C3PO) that lea...
research
11/16/2018

On the Complexity of Exploration in Goal-Driven Navigation

Building agents that can explore their environments intelligently is a c...
research
11/01/2022

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (RL) is a promising direction fo...
research
04/01/2022

Traversability, Reconfiguration, and Reachability in the Gadget Framework

Consider an agent traversing a graph of "gadgets", each with local state...
research
06/16/2018

Avoidance Markov Metrics and Node Pivotality Ranking

We introduce the avoidance Markov metrics and theories which provide mor...

Please sign up or login with your details

Forgot password? Click here to reset