LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

by   DaeJin Jo, et al.

Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreover, the interference from task-irrelevant observations in the episodic count may cause its intrinsic motivation to overlook task-related important changes of states, and the novelty in an episodic manner can lead to repeatedly revisit the familiar states across episodes. In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems. In particular, the proposed intrinsic reward consists of the episodic novelty and the task-specific modulation where the former employs a vector quantized variational autoencoder to automatically obtain the discrete state codes for fast counting while the latter regulates the episodic novelty by learning a modulator to optimize the task-specific extrinsic reward. The proposed LECO specifically enables the automatic transition from exploration to exploitation during reinforcement learning. We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in MiniGrid and DMLab environments.


page 5

page 19

page 20

page 22

page 23

page 24

page 25


Cyclophobic Reinforcement Learning

In environments with sparse rewards, finding a good inductive bias for e...

Exploration via Elliptical Episodic Bonuses

In recent years, a number of reinforcement learning (RL) methods have be...

Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration

Exploration in sparse reward reinforcement learning remains a difficult ...

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Count-based exploration algorithms are known to perform near-optimally w...

An Evaluation Study of Intrinsic Motivation Techniques applied to Reinforcement Learning over Hard Exploration Environments

In the last few years, the research activity around reinforcement learni...

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Maintaining long-term exploration ability remains one of the challenges ...

Variational Intrinsic Control Revisited

In this paper, we revisit variational intrinsic control (VIC), an unsupe...

Please sign up or login with your details

Forgot password? Click here to reset