Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks

by   Trevor Ablett, et al.

Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.


page 1

page 2

page 4

page 7

page 10


Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Effective exploration continues to be a significant challenge that preve...

Reinforced Imitation Learning by Free Energy Principle

Reinforcement Learning (RL) requires a large amount of exploration espec...

Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning

Supervised imitation learning, also known as behavior cloning, suffers f...

A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration

We pose a new question: Can agents learn how to combine actions from pre...

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone

This paper studies Imitation Learning from Observations alone (ILFO) whe...

SQIL: Imitation Learning via Regularized Behavioral Cloning

Learning to imitate expert behavior given action demonstrations containi...

Imitating Unknown Policies via Exploration

Behavioral cloning is an imitation learning technique that teaches an ag...

Please sign up or login with your details

Forgot password? Click here to reset