Band-limited Soft Actor Critic Model

06/19/2020
by   Miguel Campo, et al.
0

Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments. A key element of SAC networks is entropy regularization, which prevents the SAC actor from optimizing against fine grained features, oftentimes transient, of the state-action value function. This results in better sample efficiency during early training. We take this idea one step further by artificially bandlimiting the target critic spatial resolution through the addition of a convolutional filter. We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low and high frequency components of the state-action value approximation, allowing the critic to learn faster. In experiments, the bandlimited SAC outperformed the classic twin-critic SAC in a number of Gym environments, and displayed more stability in returns. We derive novel insights about SAC by adding a stochastic noise disturbance, a technique that is increasingly being used to learn robust policies that transfer well to the real world counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2021

Error Controlled Actor-Critic

On error of value function inevitably causes an overestimation phenomeno...
research
12/31/2021

Actor Loss of Soft Actor Critic Explained

This technical report is devoted to explaining how the actor loss of sof...
research
10/10/2022

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

We revisit the standard formulation of tabular actor-critic algorithm as...
research
10/09/2020

Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients

Policy gradient algorithms have proven to be successful in diverse decis...
research
10/02/2020

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

We investigate the discounting mismatch in actor-critic algorithm implem...
research
03/08/2023

Soft Actor-Critic Algorithm with Truly Inequality Constraint

Soft actor-critic (SAC) in reinforcement learning is expected to be one ...
research
06/24/2021

Mix and Mask Actor-Critic Methods

Shared feature spaces for actor-critic methods aims to capture generaliz...

Please sign up or login with your details

Forgot password? Click here to reset