Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

05/28/2019
by   Yufei Wang, et al.
0

In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy. In classic RL, return is defined as the exponentially discounted sum of future rewards. One key insight is that there could be many feasible ways to define the form of the return function (and thus the value), from which the same optimal policy can be derived, yet these different forms might render dramatically different speeds of learning this policy. In this paper, we research how to modify the form of the return function to enhance the learning towards the optimal policy. We propose to use a general mathematical form for return function, and employ meta-learning to learn the optimal return function in an end-to-end manner. We test our methods on a specially designed maze environment and several Atari games, and our experimental results clearly indicate the advantages of automatically learning optimal return functions in reinforcement learning.

READ FULL TEXT
research
05/24/2018

Meta-Gradient Reinforcement Learning

The goal of reinforcement learning algorithms is to estimate and/or opti...
research
05/07/2023

Truncating Trajectories in Monte Carlo Reinforcement Learning

In Reinforcement Learning (RL), an agent acts in an unknown environment ...
research
06/14/2018

Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

In this paper, we focus on policy discrepancy in return-based deep Q-net...
research
09/01/2020

Return to Bali

This paper gives an overview of the project Return to Bali that seeks to...
research
02/19/2020

Value-driven Hindsight Modelling

Value estimation is a critical component of the reinforcement learning (...
research
07/05/2019

Incrementally Learning Functions of the Return

Temporal difference methods enable efficient estimation of value functio...
research
07/04/2023

A Scalable Reinforcement Learning-based System Using On-Chain Data for Cryptocurrency Portfolio Management

On-chain data (metrics) of blockchain networks, akin to company fundamen...

Please sign up or login with your details

Forgot password? Click here to reset