Policy Manifold Search for Improving Diversity-based Neuroevolution

by   Nemanja Rakicevic, et al.

Diversity-based approaches have recently gained popularity as an alternative paradigm to performance-based policy search. A popular approach from this family, Quality-Diversity (QD), maintains a collection of high-performing policies separated in the diversity-metric space, defined based on policies' rollout behaviours. When policies are parameterised as neural networks, i.e. Neuroevolution, QD tends to not scale well with parameter space dimensionality. Our hypothesis is that there exists a low-dimensional manifold embedded in the policy parameter space, containing a high density of diverse and feasible policies. We propose a novel approach to diversity-based policy search via Neuroevolution, that leverages learned latent representations of the policy parameters which capture the local structure of the data. Our approach iteratively collects policies according to the QD framework, in order to (i) build a collection of diverse policies, (ii) use it to learn a latent representation of the policy parameters, (iii) perform policy search in the learned latent space. We use the Jacobian of the inverse transformation (i.e.reconstruction function) to guide the search in the latent space. This ensures that the generated samples remain in the high-density regions of the original space, after reconstruction. We evaluate our contributions on three continuous control tasks in simulated environments, and compare to diversity-based baselines. The findings suggest that our approach yields a more efficient and robust policy search process.


page 16

page 17

page 20

page 21


Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution

Neuroevolution is an alternative to gradient-based optimisation that has...

Expressivity of Parameterized and Data-driven Representations in Quality Diversity Search

We consider multi-solution optimization and generative models for the ge...

Curiosity creates Diversity in Policy Search

When searching for policies, reward-sparse environments often lack suffi...

Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms

Reinforcement learning agents need a reward signal to learn successful p...

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

Lying on the heart of intelligent decision-making systems, how policy is...

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

In the context of neuroevolution, Quality-Diversity algorithms have prov...

EOS: Automatic In-vivo Evolution of Kernel Policies for Better Performance

Today's monolithic kernels often implement a small, fixed set of policie...

Please sign up or login with your details

Forgot password? Click here to reset