Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

by   Mingyang Wang, et al.

Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.


page 6

page 11

page 13

page 15

page 18


Meta-Reinforcement Learning in Broad and Non-Parametric Environments

Recent state-of-the-art artificial agents lack the ability to adapt rapi...

Meta-Reinforcement Learning by Tracking Task Non-stationarity

Many real-world domains are subject to a structured non-stationarity whi...

Self-Paced Contextual Reinforcement Learning

Generalization and adaptation of learned skills to novel situations is a...

A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning

While reinforcement learning (RL) algorithms are achieving state-of-the-...

Human-Timescale Adaptation in an Open-Ended Task Space

Foundation models have shown impressive adaptation and scalability in su...

Bayesian Nonparametrics for Non-exhaustive Learning

Non-exhaustive learning (NEL) is an emerging machine-learning paradigm d...

Bottom-Up Meta-Policy Search

Despite of the recent progress in agents that learn through interaction,...

Please sign up or login with your details

Forgot password? Click here to reset