Landmark Guided Active Exploration with Stable Low-level Policy Learning

06/30/2023
by   Fei Cui, et al.
0

Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. Moreover, the dynamic variability of the low-level policy introduces non-stationarity to the high-level state transition function, significantly impeding the learning of the high-level policy. In this paper, we design a measure of prospect for subgoals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. To address the non-stationarity arising from the dynamic changes of the low-level policy, we apply a state-specific regularization to the learning of low-level policy, which facilitates stable learning of the hierarchical policy. The experimental results demonstrate that our proposed exploration strategy significantly outperforms the baseline methods across multiple tasks.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

research
10/26/2021

Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning

Goal-conditioned hierarchical reinforcement learning (HRL) has shown pro...
research
05/24/2022

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Offline Reinforcement learning (RL) has shown potent in many safe-critic...
research
05/31/2021

Efficient Hierarchical Exploration with Stable Subgoal Representation Learning

Goal-conditioned hierarchical reinforcement learning (HRL) serves as a s...
research
05/13/2019

Learning and Exploiting Multiple Subgoals for Fast Exploration in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) exploits temporally extended a...
research
10/30/2021

Adjacency constraint for efficient hierarchical reinforcement learning

Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promisin...
research
02/08/2023

Learning Graph-Enhanced Commander-Executor for Multi-Agent Navigation

This paper investigates the multi-agent navigation problem, which requir...
research
07/01/2021

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

Goal-conditioned reinforcement learning endows an agent with a large var...

Please sign up or login with your details

Forgot password? Click here to reset