Soft Actor-Critic with Cross-Entropy Policy Optimization

12/21/2021
by   Zhenyang Shi, et al.
0

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinforcement learning (RL) algorithms that is within the maximum entropy based RL framework. SAC is demonstrated to perform very well in a list of continous control tasks with good stability and robustness. SAC learns a stochastic Gaussian policy that can maximize a trade-off between total expected reward and the policy entropy. To update the policy, SAC minimizes the KL-Divergence between the current policy density and the soft value function density. Reparameterization trick is then used to obtain the approximate gradient of this divergence. In this paper, we propose Soft Actor-Critic with Cross-Entropy Policy Optimization (SAC-CEPO), which uses Cross-Entropy Method (CEM) to optimize the policy network of SAC. The initial idea is to use CEM to iteratively sample the closest distribution towards the soft value function density and uses the resultant distribution as a target to update the policy network. For the purpose of reducing the computational complexity, we also introduce a decoupled policy structure that decouples the Gaussian policy into one policy that learns the mean and one other policy that learns the deviation such that only the mean policy is trained by CEM. We show that this decoupled policy structure does converge to a optimal and we also demonstrate by experiments that SAC-CEPO achieves competitive performance against the original SAC.

READ FULL TEXT

page 5

page 6

research
01/04/2018

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonst...
research
03/25/2019

Q-Learning for Continuous Actions with Cross-Entropy Guided Policies

Off-Policy reinforcement learning (RL) is an important class of methods ...
research
06/09/2020

AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

Entropy is ubiquitous in machine learning, but it is in general intracta...
research
03/08/2023

Soft Actor-Critic Algorithm with Truly Inequality Constraint

Soft actor-critic (SAC) in reinforcement learning is expected to be one ...
research
10/08/2019

Deep Value Model Predictive Control

In this paper, we introduce an actor-critic algorithm called Deep Value ...
research
01/28/2022

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task...
research
10/11/2021

Bid Optimization using Maximum Entropy Reinforcement Learning

Real-time bidding (RTB) has become a critical way of online advertising....

Please sign up or login with your details

Forgot password? Click here to reset