Reinforcement Learning with Non-Markovian Rewards

12/05/2019
by   Maor Gaon, et al.
0

The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2018

Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning

Exogenous state variables and rewards can slow down reinforcement learni...
research
06/01/2021

Reward is enough for convex MDPs

Maximising a cumulative reward function that is Markov and stationary, i...
research
12/06/2021

MDPFuzzer: Finding Crash-Triggering State Sequences in Models Solving the Markov Decision Process

The Markov decision process (MDP) provides a mathematical framework for ...
research
02/28/2023

Exploiting Multiple Abstractions in Episodic RL via Reward Shaping

One major limitation to the applicability of Reinforcement Learning (RL)...
research
11/09/2016

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning

Deep reinforcement learning (deep RL) has been successful in learning so...
research
02/12/2021

Disturbing Reinforcement Learning Agents with Corrupted Rewards

Reinforcement Learning (RL) algorithms have led to recent successes in s...
research
07/25/2023

Submodular Reinforcement Learning

In reinforcement learning (RL), rewards of states are typically consider...

Please sign up or login with your details

Forgot password? Click here to reset