Sampling Efficient Deep Reinforcement Learning through Preference-Guided Stochastic Exploration

06/20/2022
by   Wenhui Huang, et al.
10

Massive practical works addressed by Deep Q-network (DQN) algorithm have indicated that stochastic policy, despite its simplicity, is the most frequently used exploration approach. However, most existing stochastic exploration approaches either explore new actions heuristically regardless of Q-values or inevitably introduce bias into the learning process to couple the sampling with Q-values. In this paper, we propose a novel preference-guided ϵ-greedy exploration algorithm that can efficiently learn the action distribution in line with the landscape of Q-values for DQN without introducing additional bias. Specifically, we design a dual architecture consisting of two branches, one of which is a copy of DQN, namely the Q-branch. The other branch, which we call the preference branch, learns the action preference that the DQN implicit follows. We theoretically prove that the policy improvement theorem holds for the preference-guided ϵ-greedy policy and experimentally show that the inferred action preference distribution aligns with the landscape of corresponding Q-values. Consequently, preference-guided ϵ-greedy exploration motivates the DQN agent to take diverse actions, i.e., actions with larger Q-values can be sampled more frequently whereas actions with smaller Q-values still have a chance to be explored, thus encouraging the exploration. We assess the proposed method with four well-known DQN variants in nine different environments. Extensive results confirm the superiority of our proposed method in terms of performance and convergence speed. Index Terms- Preference-guided exploration, stochastic policy, data efficiency, deep reinforcement learning, deep Q-learning.

READ FULL TEXT

page 1

page 7

page 9

research
02/10/2021

Policy Augmentation: An Exploration Strategy for Faster Convergence of Deep Reinforcement Learning Algorithms

Despite advancements in deep reinforcement learning algorithms, developi...
research
08/05/2019

Construction of Macro Actions for Deep Reinforcement Learning

Conventional deep reinforcement learning typically determines an appropr...
research
06/27/2019

ExTra: Transfer-guided Exploration

In this work we present a novel approach for transfer-guided exploration...
research
10/26/2022

Knowledge-Guided Exploration in Deep Reinforcement Learning

This paper proposes a new method to drastically speed up deep reinforcem...
research
01/26/2022

Exploiting Semantic Epsilon Greedy Exploration Strategy in Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) can model many real world appl...
research
08/22/2023

Careful at Estimation and Bold at Exploration

Exploration strategies in continuous action space are often heuristic du...
research
02/27/2010

Learning from Logged Implicit Exploration Data

We provide a sound and consistent foundation for the use of nonrandom ex...

Please sign up or login with your details

Forgot password? Click here to reset