DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

06/26/2020
by   Rasool Fakoor, et al.
0

This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. This is contrast to existing literature which creates sophisticated off-policy techniques. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step; existing solutions such as delayed policy updates do not mitigate this issue. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from the replay buffer and selectively update the policy to prevent deterioration of performance. We make these claims using extensive experimentation on a set of challenging MuJoCo tasks. A short video of our results can be seen at https://tinyurl.com/scs6p5m .

READ FULL TEXT
research
09/18/2019

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Improving the sample efficiency in reinforcement learning has been a lon...
research
02/17/2020

Adaptive Experience Selection for Policy Gradient

Policy gradient reinforcement learning (RL) algorithms have achieved imp...
research
05/05/2019

P3O: Policy-on Policy-off Policy Optimization

On-policy reinforcement learning (RL) algorithms have high sample comple...
research
05/04/2023

Rethinking Population-assisted Off-policy Reinforcement Learning

While off-policy reinforcement learning (RL) algorithms are sample effic...
research
01/09/2020

Population-Guided Parallel Policy Search for Reinforcement Learning

In this paper, a new population-guided parallel learning scheme is propo...
research
03/02/2023

The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting

Knowing the learning dynamics of policy is significant to unveiling the ...
research
09/15/2022

On the Reuse Bias in Off-Policy Reinforcement Learning

Importance sampling (IS) is a popular technique in off-policy evaluation...

Please sign up or login with your details

Forgot password? Click here to reset