Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning

05/23/2023
by   Sumeet Batra, et al.
0

Training generally capable agents that perform well in unseen dynamic environments is a long-term goal of robot learning. Quality Diversity Reinforcement Learning (QD-RL) is an emerging class of reinforcement learning (RL) algorithms that blend insights from Quality Diversity (QD) and RL to produce a collection of high performing and behaviorally diverse policies with respect to a behavioral embedding. Existing QD-RL approaches have thus far taken advantage of sample-efficient off-policy RL algorithms. However, recent advances in high-throughput, massively parallelized robotic simulators have opened the door for algorithms that can take advantage of such parallelism, and it is unclear how to scale existing off-policy QD-RL methods to these new data-rich regimes. In this work, we take the first steps to combine on-policy RL methods, specifically Proximal Policy Optimization (PPO), that can leverage massive parallelism, with QD, and propose a new QD-RL method with these high-throughput simulators and on-policy training in mind. Our proposed Proximal Policy Gradient Arborescence (PPGA) algorithm yields a 4x improvement over baselines on the challenging humanoid domain.

READ FULL TEXT
research
12/02/2019

On-policy Reinforcement Learning with Entropy Regularization

Entropy regularization is an imported idea in reinforcement learning, wi...
research
12/17/2020

High-Throughput Synchronous Deep RL

Deep reinforcement learning (RL) is computationally demanding and requir...
research
11/05/2020

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Quality-Diversity (QD) is a concept from Neuroevolution with some intrig...
research
02/14/2020

Robust Reinforcement Learning via Adversarial training with Langevin Dynamics

We introduce a sampling perspective to tackle the challenging task of tr...
research
09/26/2022

DEFT: Diverse Ensembles for Fast Transfer in Reinforcement Learning

Deep ensembles have been shown to extend the positive effect seen in typ...
research
01/28/2022

Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

Enabling reinforcement learning (RL) agents to leverage a knowledge base...
research
04/28/2020

Improving Sample Efficiency and Multi-Agent Communication in RL-based Train Rescheduling

We present preliminary results from our sixth placed entry to the Flatla...

Please sign up or login with your details

Forgot password? Click here to reset