Plan-Space State Embeddings for Improved Reinforcement Learning

by   Max Pflueger, et al.

Robot control problems are often structured with a policy function that maps state values into control values, but in many dynamic problems the observed state can have a difficult to characterize relationship with useful policy actions. In this paper we present a new method for learning state embeddings from plans or other forms of demonstrations such that the embedding space has a specified geometric relationship with the demonstrations. We present a novel variational framework for learning these embeddings that attempts to optimize trajectory linearity in the learned embedding space. We show how these embedding spaces can then be used as an augmentation to the robot state in reinforcement learning problems. We use kinodynamic planning to generate training trajectories for some example environments, and then train embedding spaces for these environments. We show empirically that observing a system in the learned embedding space improves the performance of policy gradient reinforcement learning algorithms, particularly by reducing the variance between training runs. Our technique is limited to environments where demonstration data is available, but places no limits on how that data is collected. Our embedding technique provides a way to transfer domain knowledge from existing technologies such as planning and control algorithms, into more flexible policy learning algorithms, by creating an abstract representation of the robot state with meaningful geometry.


page 1

page 6

page 7

page 8


Robobarista: Learning to Manipulate Novel Objects via Deep Multimodal Embedding

There is a large variety of objects and appliances in human environments...

Learning Video-Conditioned Policies for Unseen Manipulation Tasks

The ability to specify robot commands by a non-expert user is critical f...

Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces

Meta-reinforcement learning (RL) addresses the problem of sample ineffic...

State Representation Learning for Goal-Conditioned Reinforcement Learning

This paper presents a novel state representation for reward-free Markov ...

Off-Policy Evaluation in Embedded Spaces

Off-policy evaluation methods are important in recommendation systems an...

DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces

The ability to learn robust policies while generalizing over large discr...

Learning Robot Structure and Motion Embeddings using Graph Neural Networks

We propose a learning framework to find the representation of a robot's ...

Please sign up or login with your details

Forgot password? Click here to reset