Reinforcement Learning with Random Delays

10/06/2020
by   Simon Ramstedt, et al.
0

Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.

READ FULL TEXT
research
05/26/2023

Adaptive PD Control using Deep Reinforcement Learning for Local-Remote Teleoperation with Stochastic Time Delays

Local-remote systems allow robots to execute complex tasks in hazardous ...
research
12/20/2021

Variational Quantum Soft Actor-Critic

Quantum computing has a superior advantage in tackling specific problems...
research
03/11/2020

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a va...
research
04/13/2021

TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control

We propose temporally abstract soft actor-critic (TASAC), an off-policy ...
research
07/29/2020

Learning Object-conditioned Exploration using Distributed Soft Actor Critic

Object navigation is defined as navigating to an object of a given label...
research
06/04/2020

Refined Continuous Control of DDPG Actors via Parametrised Activation

In this paper, we propose enhancing actor-critic reinforcement learning ...
research
10/23/2019

Partially Detected Intelligent Traffic Signal Control: Environmental Adaptation

Partially Detected Intelligent Traffic Signal Control (PD-ITSC) systems ...

Please sign up or login with your details

Forgot password? Click here to reset