Accelerating exploration and representation learning with offline pre-training

03/31/2023
by   Bogdan Mazoure, et al.
0

Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.

READ FULL TEXT
research
01/27/2022

The Challenges of Exploration for Offline Reinforcement Learning

Offline Reinforcement Learning (ORL) enablesus to separately study the t...
research
06/11/2021

Offline Reinforcement Learning as Anti-Exploration

Offline Reinforcement Learning (RL) aims at learning an optimal control ...
research
12/01/2021

Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation

Complex sequential tasks in continuous-control settings often require ag...
research
06/23/2020

Show me the Way: Intrinsic Motivation from Demonstrations

The study of exploration in Reinforcement Learning (RL) has a long histo...
research
10/13/2021

OPEn: An Open-ended Physics Environment for Learning Without a Task

Humans have mental models that allow them to plan, experiment, and reaso...
research
10/12/2022

Contrastive introspection (ConSpec) to rapidly identify invariant steps for success

Reinforcement learning (RL) algorithms have achieved notable success in ...
research
11/22/2021

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

Representation learning lies at the heart of the empirical success of de...

Please sign up or login with your details

Forgot password? Click here to reset