Learning Neural Search Policies for Classical Planning

by   Pawel Gomoluch, et al.

Heuristic forward search is currently the dominant paradigm in classical planning. Forward search algorithms typically rely on a single, relatively simple variation of best-first search and remain fixed throughout the process of solving a planning problem. Existing work combining multiple search techniques usually aims at supporting best-first search with an additional exploratory mechanism, triggered using a handcrafted criterion. A notable exception is very recent work which combines various search techniques using a trainable policy. It is, however, confined to a discrete action space comprising several fixed subroutines. In this paper, we introduce a parametrized search algorithm template which combines various search techniques within a single routine. The template's parameter space defines an infinite space of search algorithms, including, among others, BFS, local and random search. We further introduce a neural architecture for designating the values of the search parameters given the state of the search. This enables expressing neural search policies that change the values of the parameters as the search progresses. The policies can be learned automatically, with the objective of maximizing the planner's performance on a given distribution of planning problems. We consider a training setting based on a stochastic optimization algorithm known as the cross-entropy method (CEM). Experimental evaluation of our approach shows that it is capable of finding effective distribution-specific search policies, outperforming the relevant baselines.


page 1

page 2

page 3

page 4


Learning Classical Planning Strategies with Policy Gradient

A common paradigm in classical planning is heuristic forward search. For...

Generalized Planning as Heuristic Search

Although heuristic search is one of the most successful approaches to cl...

A Stochastic Process Model of Classical Search

Among classical search algorithms with the same heuristic information, w...

Robust and Efficient Planning using Adaptive Entropy Tree Search

In this paper, we present the Adaptive EntropyTree Search (ANTS) algorit...

Exploiting Learned Policies in Focal Search

Recent machine-learning approaches to deterministic search and domain-in...

PG3: Policy-Guided Planning for Generalized Policy Generation

A longstanding objective in classical planning is to synthesize policies...

Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution

Neuroevolution is an alternative to gradient-based optimisation that has...

Please sign up or login with your details

Forgot password? Click here to reset