Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems

by   Junmin Zhong, et al.

High variances in reinforcement learning have shown impeding successful convergence and hurting task performance. As reward signal plays an important role in learning behavior, multi-step methods have been considered to mitigate the problem, and are believed to be more effective than single step methods. However, there is a lack of comprehensive and systematic study on this important aspect to demonstrate the effectiveness of multi-step methods in solving highly complex continuous control problems. In this study, we introduce a new long N-step surrogate stage (LNSS) reward approach to effectively account for complex environment dynamics while previous methods are usually feasible for limited number of steps. The LNSS method is simple, low computational cost, and applicable to value based or policy gradient reinforcement learning. We systematically evaluate LNSS in OpenAI Gym and DeepMind Control Suite to address some complex benchmark environments that have been challenging to obtain good results by DRL in general. We demonstrate performance improvement in terms of total reward, convergence speed, and coefficient of variation (CV) by LNSS. We also provide analytical insights on how LNSS exponentially reduces the upper bound on the variances of Q value from a respective single step method


page 1

page 2

page 3

page 4


The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning

Multi-step (also called n-step) methods in reinforcement learning (RL) h...

Auxiliary Task-based Deep Reinforcement Learning for Quantum Control

Due to its property of not requiring prior knowledge of the environment,...

Mixture of Step Returns in Bootstrapped DQN

The concept of utilizing multi-step returns for updating value functions...

Improving On-policy Learning with Statistical Reward Accumulation

Deep reinforcement learning has obtained significant breakthroughs in re...

Deep Reinforcement Learning for Flipper Control of Tracked Robots

The autonomous control of flippers plays an important role in enhancing ...

High-Dimensional Control Using Generalized Auxiliary Tasks

A long-standing challenge in reinforcement learning is the design of fun...

On the Importance of Critical Period in Multi-stage Reinforcement Learning

The initial years of an infant's life are known as the critical period, ...

Please sign up or login with your details

Forgot password? Click here to reset