Deep Reinforcement Learning with Weighted Q-Learning

03/20/2020
by   Andrea Cini, et al.
0

Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. Among several Q-Learning variants proposed to address this issue, Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action-values, where the weights correspond to the probability of each action-value being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2019

Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning

We consider the problem of detecting out-of-distribution (OOD) samples i...
research
09/01/2019

Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

This paper makes one step forward towards characterizing a new family of...
research
09/19/2022

MAN: Multi-Action Networks Learning

Learning control policies with large action spaces is a challenging prob...
research
07/26/2023

A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch

The optimal dispatch of energy storage systems (ESSs) presents formidabl...
research
05/22/2017

Concrete Dropout

Dropout is used as a practical tool to obtain uncertainty estimates in l...
research
11/19/2021

Uncertainty-aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

Value estimation is one key problem in Reinforcement Learning. Albeit ma...
research
07/16/2020

Mixture of Step Returns in Bootstrapped DQN

The concept of utilizing multi-step returns for updating value functions...

Please sign up or login with your details

Forgot password? Click here to reset