Self-Educated Language Agent With Hindsight Experience Replay For Instruction Following

by   Geoffrey Cideron, et al.

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. These properties make it a natural fit to guide the training of interactive agents as it could ease recurrent challenges in Reinforcement Learning such as sample complexity, generalization, or multi-tasking. Yet, it remains an open-problem to relate language and RL in even simple instruction following scenarios. Current methods rely on expert demonstrations, auxiliary losses, or inductive biases in neural architectures. In this paper, we propose an orthogonal approach called Textual Hindsight Experience Replay (THER) that extends the Hindsight Experience Replay approach to the language setting. Whenever the agent does not fulfill its instruction, THER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, THER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We observe that this simple idea also initiates a learning synergy between language acquisition and policy learning on instruction following tasks in the BabyAI environment.


page 2

page 4

page 7

page 14

page 15


Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations

Currently, deep reinforcement learning (RL) shows impressive results in ...

Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL

In the real world, linguistic agents are also embodied agents: they perc...

Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text

Recent work has described neural-network-based agents that are trained w...

Guiding Policies with Language via Meta-Learning

Behavioral skills or policies for autonomous agents are conventionally l...

Continual Learning for Instruction Following from Realtime Feedback

We study the problem of continually training an instruction-following ag...

Automated curriculum generation for Policy Gradients from Demonstrations

In this paper, we present a technique that improves the process of train...

Grounded Language Learning in a Simulated 3D World

We are increasingly surrounded by artificially intelligent technology th...

Please sign up or login with your details

Forgot password? Click here to reset