Recurrent Predictive State Policy Networks

03/05/2018
by   Ahmed Hefny, et al.
0

We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz and Gordon, 2004; Sun et al., 2016) by modeling predictive state-- a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992) and gradient descent based on prediction error. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2022

Recurrent networks, hidden states and beliefs in partially observable environments

Reinforcement learning aims to learn optimal policies from interaction w...
research
06/21/2019

A Study of State Aliasing in Structured Prediction with RNNs

End-to-end reinforcement learning agents learn a state representation an...
research
11/28/2001

Gradient-based Reinforcement Planning in Policy-Search Methods

We introduce a learning method called "gradient-based reinforcement plan...
research
06/03/2011

Infinite-Horizon Policy-Gradient Estimation

Gradient-based approaches to direct policy search in reinforcement learn...
research
10/19/2020

Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability

Many important robotics problems are partially observable in the sense t...
research
02/13/2018

Evolved Policy Gradients

We propose a meta-learning approach for learning gradient-based reinforc...
research
12/14/2021

Learning to track environment state via predictive autoencoding

This work introduces a neural architecture for learning forward models o...

Please sign up or login with your details

Forgot password? Click here to reset