A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

06/25/2020
by   Shengyi Huang, et al.
8

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action space will typically be invalid. The usual approach to deal with this problem in policy gradient algorithms is to "mask out" invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we show that the standard working mechanism of invalid action masking corresponds to valid policy gradient updates. More importantly, it works by applying a state-dependent differentiable function during the calculation of action probability distribution, which is a practice we do not find in any other DRL algorithms. Additionally, we show its critical importance to the performance of policy gradient algorithms. Specifically, our experiments show that invalid action masking scales well when the space of invalid actions is large, while the common approach of giving negative rewards for invalid actions will fail.

READ FULL TEXT
research
06/20/2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

We show how an action-dependent baseline can be used by the policy gradi...
research
01/26/2023

Joint action loss for proximal policy optimization

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient...
research
01/08/2020

Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement le...
research
06/02/2018

Efficient Entropy for Policy Gradient with Multidimensional Action Space

In recent years, deep reinforcement learning has been shown to be adept ...
research
04/25/2018

Multiagent Soft Q-Learning

Policy gradient methods are often applied to reinforcement learning in c...
research
04/21/2021

Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching

Image-text matching is an important multi-modal task with massive applic...
research
11/15/2018

Orthogonal Policy Gradient and Autonomous Driving Application

One less addressed issue of deep reinforcement learning is the lack of g...

Please sign up or login with your details

Forgot password? Click here to reset