Truly Proximal Policy Optimization

03/19/2019
by   Yuhui Wang, et al.
0

Proximal policy optimization (PPO) is one of the most successful deep reinforcement learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the probability ratio as it devotes nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Trust Region-based PPO with Rollback (TR-PPO-RB). Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the ratio between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, which is theoretically justified according to the trust region theorem. It seems, by adhering more truly to the "proximal" property - restricting the policy within the trust region, the new algorithm improves the original PPO on both stability and sample efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Trust Region-Guided Proximal Policy Optimization

Model-free reinforcement learning relies heavily on a safe yet explorato...
research
03/09/2020

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization...
research
01/22/2021

Differentiable Trust Region Layers for Deep Reinforcement Learning

Trust region methods are a popular tool in reinforcement learning as the...
research
04/20/2022

Memory-Constrained Policy Optimization

We introduce a new constrained optimization method for policy gradient r...
research
07/02/2018

Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization

This paper proposes a first order gradient reinforcement learning algori...
research
09/27/2018

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

We propose to improve trust region policy search with normalizing flows ...
research
10/29/2021

Generalized Proximal Policy Optimization with Sample Reuse

In real-world decision making tasks, it is critical for data-driven rein...

Please sign up or login with your details

Forgot password? Click here to reset