CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric

10/20/2021
by   Yunxiao Guo, et al.
0

As an algorithm based on deep reinforcement learning, Proximal Policy Optimization (PPO) performs well in many complex tasks and has become one of the most popular RL algorithms in recent years. According to the mechanism of penalty in surrogate objective, PPO can be divided into PPO with KL Divergence (KL-PPO) and PPO with Clip function(Clip-PPO). Clip-PPO is widely used in a variety of practical scenarios and has attracted the attention of many researchers. Therefore, many variations have also been created, making the algorithm better and better. However, as a more theoretical algorithm, KL-PPO was neglected because its performance was not as good as CliP-PPO. In this article, we analyze the asymmetry effect of KL divergence on PPO's objective function , and give the inequality that can indicate when the asymmetry will affect the efficiency of KL-PPO. Proposed PPO with Correntropy Induced Metric algorithm(CIM-PPO) that use the theory of correntropy(a symmetry metric method that was widely used in M-estimation to evaluate two distributions' difference)and applied it in PPO. Then, we designed experiments based on OpenAIgym to test the effectiveness of the new algorithm and compare it with KL-PPO and CliP-PPO.

READ FULL TEXT

page 1

page 5

page 7

page 9

research
02/10/2021

On the Properties of Kullback-Leibler Divergence Between Gaussians

Kullback-Leibler (KL) divergence is one of the most important divergence...
research
02/11/2021

Optimization Issues in KL-Constrained Approximate Policy Iteration

Many reinforcement learning algorithms can be seen as versions of approx...
research
05/04/2023

The complexity of first-order optimization methods from a metric perspective

A central tool for understanding first-order optimization algorithms is ...
research
05/27/2021

Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization

This paper addresses a new interpretation of reinforcement learning (RL)...
research
05/09/2018

Description of a Tracking Metric Inspired by KL-divergence

A unified metric is given for the evaluation of tracking systems. The me...
research
02/19/2020

Identifying Invariant Factors Across Multiple Environments with KL Regression

Many datasets are collected from multiple environments (e.g. different l...
research
12/16/2018

A Logarithmic Barrier Method For Proximal Policy Optimization

Proximal policy optimization(PPO) has been proposed as a first-order opt...

Please sign up or login with your details

Forgot password? Click here to reset