Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

06/14/2018
by   Wenjia Meng, et al.
0

In this paper, we focus on policy discrepancy in return-based deep Q-network (R-DQN) learning. We propose a general framework for R-DQN, with which most of the return-based reinforcement learning algorithms can be combined with DQN. We show the performance of traditional DQN can be significantly improved by introducing returnbased reinforcement learning. In order to further improve the performance of R-DQN, we present a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give two bounds for these two measurements under the R-DQN framework. Algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments are carried out on several representative tasks from the OpenAI Gym library. Results show the algorithms with our strategy outperform the state-of-the-art R-DQN methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Most offline reinforcement learning (RL) algorithms return a target poli...
research
05/28/2019

Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

In reinforcement learning, Return, which is the weighted accumulated fut...
research
01/09/2020

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

In current reinforcement learning (RL) methods, function approximation e...
research
05/26/2020

Active Measure Reinforcement Learning for Observation Cost Minimization

Standard reinforcement learning (RL) algorithms assume that the observat...
research
10/19/2020

DQN-AF: Deep Q-Network based Adaptive Forwarding Strategy for Named Data Networking

NDN has gained significant attention due to the appearance of several un...
research
04/22/2020

Per-Step Reward: A New Perspective for Risk-Averse Reinforcement Learning

We present a new per-step reward perspective for risk-averse control in ...
research
07/27/2022

SAC-AP: Soft Actor Critic based Deep Reinforcement Learning for Alert Prioritization

Intrusion detection systems (IDS) generate a large number of false alert...

Please sign up or login with your details

Forgot password? Click here to reset