Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

06/12/2020
by   Russell Mendonca, et al.
berkeley college
Stanford University
41

Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experienced tasks. Current approaches are either not able to extrapolate well, or can do so at the expense of requiring extremely large amounts of data for on-policy meta-training. In this work, we present model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data, more easily than policies and value functions. These dynamics models can then be used to continue training policies and value functions for out-of-distribution tasks without using meta-reinforcement learning at all, by generating synthetic experience for the new task.

READ FULL TEXT

page 7

page 8

10/06/2022

Distributionally Adaptive Meta Reinforcement Learning

Meta-reinforcement learning algorithms provide a data-driven way to acqu...
02/04/2022

A Discourse on MetODS: Meta-Optimized Dynamical Synapses for Meta-Reinforcement Learning

Recent meta-reinforcement learning work has emphasized the importance of...
03/03/2023

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Reinforcement learning has shown great potential in solving complex task...
11/04/2017

Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning

Rather than learning new control policies for each new task, it is possi...
05/31/2019

Reinforcement Learning Experience Reuse with Policy Residual Representation

Experience reuse is key to sample-efficient reinforcement learning. One ...
09/10/2020

Importance Weighted Policy Learning and Adaption

The ability to exploit prior experience to solve novel problems rapidly ...
11/15/2018

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Learning policies on data synthesized by models can in principle quench ...

Please sign up or login with your details

Forgot password? Click here to reset