NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

12/21/2018
by   Sirui Xie, et al.
0

Reinforcement learning agents need exploratory behaviors to escape from local optima. These behaviors may include both immediate dithering perturbation and temporally consistent exploration. To achieve these, a stochastic policy model that is inherently consistent through a period of time is in desire, especially for tasks with either sparse rewards or long term information. In this work, we introduce a novel on-policy temporally consistent exploration strategy - Neural Adaptive Dropout Policy Exploration (NADPEx) - for deep reinforcement learning agents. Modeled as a global random variable for conditional distribution, dropout is incorporated to reinforcement learning policies, equipping them with inherent temporal consistency, even when the reward signals are sparse. Two factors, gradients' alignment with the objective and KL constraint in policy space, are discussed to guarantee NADPEx policy's stable improvement. Our experiments demonstrate that NADPEx solves tasks with sparse reward while naive exploration and parameter noise fail. It yields as well or even faster convergence in the standard mujoco benchmark for continuous control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2018

Off-Policy Deep Reinforcement Learning without Exploration

Reinforcement learning traditionally considers the task of balancing exp...
research
02/13/2018

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

Efficient exploration remains a challenging research problem in reinforc...
research
02/23/2022

Consistent Dropout for Policy Gradient Reinforcement Learning

Dropout has long been a staple of supervised learning, but is rarely use...
research
03/27/2019

Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Reinforcement learning algorithms rely on exploration to discover new be...
research
11/10/2020

Perturbation-based exploration methods in deep reinforcement learning

Recent research on structured exploration placed emphasis on identifying...
research
09/07/2018

Improving On-policy Learning with Statistical Reward Accumulation

Deep reinforcement learning has obtained significant breakthroughs in re...
research
10/01/2022

Deep Intrinsically Motivated Exploration in Continuous Control

In continuous control, exploration is often performed through undirected...

Please sign up or login with your details

Forgot password? Click here to reset