Self Punishment and Reward Backfill for Deep Q-Learning

04/10/2020
by   Mohammad Reza Bonyadi, et al.
0

Reinforcement learning agents learn by encouraging behaviours which maximize their total reward, usually provided by the environment. In many environments, however, the reward is provided after a series of actions rather than each single action, causing the agent to experience ambiguity in terms of whether those actions are effective, an issue called the credit assignment problem. In this paper, we propose two strategies, inspired by behavioural psychology, to estimate a more informative reward value for actions with no reward. The first strategy, called self-punishment, discourages the agent to avoid making mistakes, i.e., actions which lead to a terminal state. The second strategy, called the rewards backfill, backpropagates the rewards between two rewarded actions. We prove that, under certain assumptions, these two strategies maintain the order of the policies in the space of all possible policies in terms of their total reward, and, by extension, maintain the optimal policy. We incorporated these two strategies into three popular deep reinforcement learning approaches and evaluated the results on thirty Atari games. After parameter tuning, our results indicate that the proposed strategies improve the tested methods in over 65 percent of tested games by up to over 25 times performance improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2022

Shaping Advice in Deep Reinforcement Learning

Reinforcement learning involves agents interacting with an environment t...
research
03/29/2021

Shaping Advice in Deep Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning involves multiple agents interacting ...
research
11/18/2022

Credit-cognisant reinforcement learning for multi-agent cooperation

Traditional multi-agent reinforcement learning (MARL) algorithms, such a...
research
07/31/2021

Inverse Reinforcement Learning for Strategy Identification

In adversarial environments, one side could gain an advantage by identif...
research
04/29/2021

Adapting to Reward Progressivity via Spectral Reinforcement Learning

In this paper we consider reinforcement learning tasks with progressive ...
research
01/23/2018

Curiosity-driven reinforcement learning with homeostatic regulation

We propose a curiosity reward based on information theory principles and...
research
01/10/2014

Exploiting generalisation symmetries in accuracy-based learning classifier systems: An initial study

Modern learning classifier systems typically exploit a niched genetic al...

Please sign up or login with your details

Forgot password? Click here to reset