Enhanced Scene Specificity with Sparse Dynamic Value Estimation

11/25/2020
by   Jaskirat Singh, et al.
2

Multi-scene reinforcement learning involves training the RL agent across multiple scenes / levels from the same task, and has become essential for many generalization applications. However, the inclusion of multiple scenes leads to an increase in sample variance for policy gradient computations, often resulting in suboptimal performance with the direct application of traditional methods (e.g. PPO, A3C). One strategy for variance reduction is to consider each scene as a distinct Markov decision process (MDP) and learn a joint value function dependent on both state (s) and MDP (M). However, this is non-trivial as the agent is usually unaware of the underlying level at train / test times in multi-scene RL. Recently, Singh et al. [1] tried to address this by proposing a dynamic value estimation approach that models the true joint value function distribution as a Gaussian mixture model (GMM). In this paper, we argue that the error between the true scene-specific value function and the predicted dynamic estimate can be further reduced by progressively enforcing sparse cluster assignments once the agent has explored most of the state space. The resulting agents not only show significant improvements in the final reward score across a range of OpenAI ProcGen environments, but also exhibit increased navigation efficiency while completing a game level.

READ FULL TEXT

page 4

page 7

page 8

research
02/14/2021

Sparse Attention Guided Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

Training deep reinforcement learning agents on environments with multipl...
research
05/25/2020

Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

Training deep reinforcement learning agents on environments with multipl...
research
02/08/2019

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Reinforcement learning (RL) agents have traditionally been tasked with m...
research
07/09/2021

Safe Exploration by Solving Early Terminated MDP

Safe exploration is crucial for the real-world application of reinforcem...
research
10/16/2020

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that ma...
research
12/27/2022

Variance Reduction for Score Functions Using Optimal Baselines

Many problems involve the use of models which learn probability distribu...
research
06/14/2016

Digits that are not: Generating new types through deep neural nets

For an artificial creative agent, an essential driver of the search for ...

Please sign up or login with your details

Forgot password? Click here to reset