t-Soft Update of Target Network for Deep Reinforcement Learning

08/25/2020
by   Taisuke Kobayashi, et al.
0

This paper proposes a new robust update rule of the target network for deep reinforcement learning, to replace the conventional update rule, given as an exponential moving average. The problem with the conventional rule is the fact that all the parameters are smoothly updated with the same speed, even when some of them are trying to update toward the wrong directions. To robustly update the parameters, the t-soft update, which is inspired by the student-t distribution, is derived with reference to the analogy between the exponential moving average and the normal distribution. In most of PyBullet robotics simulations, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2022

Consolidated Adaptive T-soft Update for Deep Reinforcement Learning

Demand for deep reinforcement learning (DRL) is gradually increased to e...
research
09/22/2021

Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods

In value-based deep reinforcement learning methods, approximation of val...
research
12/20/2022

Variational Quantum Soft Actor-Critic for Robotic Arm Control

Deep Reinforcement Learning is emerging as a promising approach for the ...
research
08/23/2022

An intelligent algorithmic trading based on a risk-return reinforcement learning algorithm

This scientific paper propose a novel portfolio optimization model using...
research
08/01/2021

A Reinforcement Learning Approach for Scheduling in mmWave Networks

We consider a source that wishes to communicate with a destination at a ...
research
01/22/2019

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Multi-step methods such as Retrace(λ) and n-step Q-learning have become ...
research
05/29/2023

Towards Constituting Mathematical Structures for Learning to Optimize

Learning to Optimize (L2O), a technique that utilizes machine learning t...

Please sign up or login with your details

Forgot password? Click here to reset