Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

06/10/2020
by   Zaheen Farraz Ahmad, et al.
0

Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2020

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can ...
research
10/19/2020

Dream and Search to Control: Latent Space Planning for Continuous Control

Learning and planning with latent space dynamics has been shown to be us...
research
04/13/2021

Learning and Planning in Complex Action Spaces

Many important real-world problems have action spaces that are high-dime...
research
06/14/2021

RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by Backpropagation

Planning provides a framework for optimizing sequential decisions in com...
research
05/31/2023

Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction

Large discrete action spaces remain a central challenge for reinforcemen...
research
10/29/2020

Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

We consider off-policy evaluation (OPE) in continuous action domains, su...
research
05/09/2012

Seeing the Forest Despite the Trees: Large Scale Spatial-Temporal Decision Making

We introduce a challenging real-world planning problem where actions mus...

Please sign up or login with your details

Forgot password? Click here to reset