On the Geometry of Reinforcement Learning in Continuous State and Action Spaces

by   Saket Tiwari, et al.

Advances in reinforcement learning have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens. Central to our work is the idea that the transition dynamics induce a low dimensional manifold of reachable states embedded in the high-dimensional nominal state space. We prove that, under certain conditions, the dimensionality of this manifold is at most the dimensionality of the action space plus one. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments. We further demonstrate the applicability of our result by learning a policy in this low dimensional representation. To do so we introduce an algorithm that learns a mapping to a low dimensional representation, as a narrow hidden layer of a deep neural network, in tandem with the policy using DDPG. Our experiments show that a policy learnt this way perform on par or better for four MuJoCo control suite tasks.


page 18

page 19


Low-Dimensional State and Action Representation Learning with MDP Homomorphism Metrics

Deep Reinforcement Learning has shown its ability in solving complicated...

Factored Conditional Filtering: Tracking States and Estimating Parameters in High-Dimensional Spaces

This paper introduces the factored conditional filter, a new filtering a...

Low Dimensional State Representation Learning with Robotics Priors in Continuous Action Spaces

Autonomous robots require high degrees of cognitive and motoric intellig...

Safe Exploration of State and Action Spaces in Reinforcement Learning

In this paper, we consider the important problem of safe exploration in ...

Wasserstein Robust Reinforcement Learning

Reinforcement learning algorithms, though successful, tend to over-fit t...

Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution

Neuroevolution is an alternative to gradient-based optimisation that has...

Re-evaluating Word Mover's Distance

The word mover's distance (WMD) is a fundamental technique for measuring...

Please sign up or login with your details

Forgot password? Click here to reset