Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

01/24/2022
by   Haichao Zhang, et al.
0

Standard model-free reinforcement learning algorithms optimize a policy that generates the action to be taken in the current time step in order to maximize expected future return. While flexible, it faces difficulties arising from the inefficient exploration due to its single step nature. In this work, we present Generative Planning method (GPM), which can generate actions not only for the current step, but also for a number of future steps (thus termed as generative planning). This brings several benefits to GPM. Firstly, since GPM is trained by maximizing value, the plans generated from it can be regarded as intentional action sequences for reaching high value regions. GPM can therefore leverage its generated multi-step plans for temporally coordinated exploration towards high value regions, which is potentially more effective than a sequence of actions generated by perturbing each action at single step level, whose consistent movement decays exponentially with the number of exploration steps. Secondly, starting from a crude initial plan generator, GPM can refine it to be adaptive to the task, which, in return, benefits future explorations. This is potentially more effective than commonly used action-repeat strategy, which is non-adaptive in its form of plans. Additionally, since the multi-step plan can be interpreted as the intent of the agent from now to a span of time period into the future, it offers a more informative and intuitive signal for interpretation. Experiments are conducted on several benchmark environments and the results demonstrated its effectiveness compared with several baseline methods.

READ FULL TEXT

page 7

page 10

research
09/12/2022

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

A promising way to improve the sample efficiency of reinforcement learni...
research
12/28/2018

Dynamic Planning Networks

We introduce Dynamic Planning Networks (DPN), a novel architecture for d...
research
06/07/2023

Dual policy as self-model for planning

Planning is a data efficient decision-making strategy where an agent sel...
research
05/30/2019

Combating the Compounding-Error Problem with a Multi-step Model

Model-based reinforcement learning is an appealing framework for creatin...
research
03/16/2023

Multi-step planning with learned effects of (possibly partial) action executions

In this paper, we propose an affordance model, which is built on Conditi...
research
08/17/2020

Estimating action plans for smart poultry houses

In poultry farming, the systematic choice, update, and implementation of...
research
04/22/2020

Flexible and Efficient Long-Range Planning Through Curious Exploration

Identifying algorithms that flexibly and efficiently discover temporally...

Please sign up or login with your details

Forgot password? Click here to reset