Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

09/19/2022
by   Quentin Le Lidec, et al.
11

Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly converge towards a locally optimal control trajectory which is only valid within the vicinity of the solution. Over the past decade, several approaches have aimed to adequately combine the two classes of methods in order to obtain the best of both worlds. Following on from this line of research, we propose several improvements on top of these approaches to learn global control policies quicker, notably by leveraging sensitivity information stemming from TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce the consensus between TO and policy learning. We evaluate the benefits of these improvements on various classical tasks in robotics through comparison with existing approaches in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2019

Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning

Recent breakthroughs both in reinforcement learning and trajectory optim...
research
10/30/2021

Learning Coordinated Terrain-Adaptive Locomotion by Imitating a Centroidal Dynamics Planner

Dynamic quadruped locomotion over challenging terrains with precise foot...
research
08/11/2020

Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Deep Reinforcement Learning (RL) has shown great success in learning com...
research
10/21/2019

Policy Optimization for H_2 Linear Control with H_∞ Robustness Guarantee: Implicit Regularization and Global Convergence

Policy optimization (PO) is a key ingredient for reinforcement learning ...
research
02/04/2022

Learning Interpretable, High-Performing Policies for Continuous Control Problems

Gradient-based approaches in reinforcement learning (RL) have achieved t...
research
03/09/2023

Evolving Populations of Diverse RL Agents with MAP-Elites

Quality Diversity (QD) has emerged as a powerful alternative optimizatio...
research
01/30/2023

Passivizing learned policies and learning passive policies with virtual energy tanks in robotics

Within a robotic context, we merge the techniques of passivity-based con...

Please sign up or login with your details

Forgot password? Click here to reset