Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

08/24/2023
by   Taisuke Kobayashi, et al.
0

Robot control using reinforcement learning has become popular, but its learning process generally terminates halfway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termination. That is, by forcibly assuming zero value after termination, unintentionally implicit underestimation or overestimation occurs, depending on the reward design in the normal states. When the episode is terminated due to task failure, the failure may be highly valued with the unintentional overestimation, and the wrong policy may be acquired. Although this problem can be avoided by paying attention to the reward design, it is essential in practical use of TD learning to review the exception handling at termination. This paper therefore proposes a method to intentionally underestimate the value after termination to avoid learning failures due to the unintentional overestimation. In addition, the degree of underestimation is adjusted according to the degree of stationarity at termination, thereby preventing excessive exploration due to the intentional underestimation. Simulations and real robot experiments showed that the proposed method can stably obtain the optimal policies for various tasks and reward designs. https://youtu.be/AxXr8uFOe7M

READ FULL TEXT

page 1

page 6

research
06/03/2021

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

This paper proposes a new reinforcement learning with hyperbolic discoun...
research
06/29/2020

Learning and Planning in Average-Reward Markov Decision Processes

We introduce improved learning and planning algorithms for average-rewar...
research
02/11/2019

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

In reinforcement learning, a decision needs to be made at some point as ...
research
05/30/2022

Reinforcement Learning with a Terminator

We present the problem of reinforcement learning with exogenous terminat...
research
09/04/2020

Policy Gradient Reinforcement Learning for Policy Represented by Fuzzy Rules: Application to Simulations of Speed Control of an Automobile

A method of a fusion of fuzzy inference and policy gradient reinforcemen...
research
03/09/2020

Transfer Reinforcement Learning under Unobserved Contextual Information

In this paper, we study a transfer reinforcement learning problem where ...
research
01/22/2013

Properties of the Least Squares Temporal Difference learning algorithm

This paper presents four different ways of looking at the well-known Lea...

Please sign up or login with your details

Forgot password? Click here to reset