Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

06/06/2019
by   Patrick Nadeem Ward, et al.
0

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as beingsample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated by chosen dueits easy re-parametrization rather than its modeling power. We introduce Normalizing Flow policies within the SAC framework that learn more expressive classes of policies than simple factored Gaussians. We also present a series of stabilization tricks that enable effective training of these policies in the RL setting.We show empirically on continuous grid world tasks that our approach increases stability and is better suited to difficult exploration in sparse reward settings.

READ FULL TEXT
research
04/22/2022

TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control

Due to their complex nonlinear dynamics and batch-to-batch variability, ...
research
10/03/2022

Latent State Marginalization as a Low-cost Approach for Improving Exploration

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework...
research
05/17/2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Offline Reinforcement Learning promises to learn effective policies from...
research
04/13/2021

TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control

We propose temporally abstract soft actor-critic (TASAC), an off-policy ...
research
05/16/2019

Leveraging exploration in off-policy algorithms via normalizing flows

Exploration is a crucial component for discovering approximately optimal...
research
10/09/2020

Deep RL With Information Constrained Policies: Generalization in Continuous Control

Biological agents learn and act intelligently in spite of a highly limit...
research
11/27/2020

Adaptable Automation with Modular Deep Reinforcement Learning and Policy Transfer

Recent advances in deep Reinforcement Learning (RL) have created unprece...

Please sign up or login with your details

Forgot password? Click here to reset