A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

11/06/2018
by   Jonathan Lee, et al.
0

On-policy imitation learning algorithms such as Dagger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for Dagger. Cheng and Boots (2018) consider the more realistic model for robotics where the underlying trajectory distribution, which is a function of the policy, is dynamic and show that it is possible to prove convergence when a condition on the rate of change of the trajectory distributions is satisfied. In this paper, we reframe that result using dynamic regret theory from the field of Online Optimization to prove convergence to locally optimal policies for Dagger, Imitation Gradient, and Multiple Imitation Gradient. These results inspire a new algorithm, Adaptive On-Policy Regularization (AOR), that ensures the conditions for convergence. We present simulation results with cart-pole balancing and walker locomotion benchmarks that suggest AOR can significantly decrease dynamic regret and chattering. To our knowledge, this the first application of dynamic regret theory to imitation learning.

READ FULL TEXT
research
07/08/2019

On-Policy Robot Imitation Learning from a Converging Supervisor

Existing on-policy imitation learning algorithms, such as DAgger, assume...
research
11/02/2010

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Sequential prediction problems such as imitation learning, where future ...
research
07/29/2022

Improved Policy Optimization for Online Imitation Learning

We consider online imitation learning (OIL), where the task is to find a...
research
12/03/2019

Continuous Online Learning and New Insights to Online Imitation Learning

Online learning is a powerful tool for analyzing iterative algorithms. H...
research
01/22/2018

Convergence of Value Aggregation for Imitation Learning

Value aggregation is a general framework for solving imitation learning ...
research
10/15/2018

Predictor-Corrector Policy Optimization

We present a predictor-corrector framework, called PicCoLO, that can tra...
research
11/14/2022

Follow the Clairvoyant: an Imitation Learning Approach to Optimal Control

We consider control of dynamical systems through the lens of competitive...

Please sign up or login with your details

Forgot password? Click here to reset