Off-policy Imitation Learning from Visual Inputs

11/08/2021
by   Zhihao Cheng, et al.
0

Recently, various successful applications utilizing expert states in imitation learning (IL) have been witnessed. However, another IL setting – IL from visual inputs (ILfVI), which has a greater promise to be applied in reality by utilizing online visual resources, suffers from low data-efficiency and poor performance resulted from an on-policy learning manner and high-dimensional visual inputs. We propose OPIfVI (Off-Policy Imitation from Visual Inputs), which is composed of an off-policy learning manner, data augmentation, and encoder techniques, to tackle the mentioned challenges, respectively. More specifically, to improve data-efficiency, OPIfVI conducts IL in an off-policy manner, with which sampled data can be used multiple times. In addition, we enhance the stability of OPIfVI with spectral normalization to mitigate the side-effect of off-policy training. The core factor, contributing to the poor performance of ILfVI, that we think is the agent could not extract meaningful features from visual inputs. Hence, OPIfVI employs data augmentation from computer vision to help train encoders that can better extract features from visual inputs. In addition, a specific structure of gradient backpropagation for the encoder is designed to stabilize the encoder training. At last, we demonstrate that OPIfVI is able to achieve expert-level performance and outperform existing baselines no matter visual demonstrations or visual observations are provided via extensive experiments using DeepMind Control Suite.

READ FULL TEXT
research
01/21/2020

Loss-annealed GAIL for sample efficient and stable Imitation Learning

Imitation learning is the problem of learning a policy from an expert po...
research
03/26/2017

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

The goal of imitation learning is to mimic expert behavior without acces...
research
10/31/2022

SEIL: Simulation-augmented Equivariant Imitation Learning

In robotic manipulation, acquiring samples is extremely expensive becaus...
research
09/09/2019

Expert-Level Atari Imitation Learning from Demonstrations Only

One of the key issues for imitation learning lies in making policy learn...
research
08/17/2023

Regularizing Adversarial Imitation Learning Using Causal Invariance

Imitation learning methods are used to infer a policy in a Markov decisi...
research
05/23/2022

Data augmentation for efficient learning from parametric experts

We present a simple, yet powerful data-augmentation technique to enable ...
research
01/02/2021

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks i...

Please sign up or login with your details

Forgot password? Click here to reset