State-Aware Variational Thompson Sampling for Deep Q-Networks

02/07/2021
by   Siddharth Aravindan, et al.
12

Thompson sampling is a well-known approach for balancing exploration and exploitation in reinforcement learning. It requires the posterior distribution of value-action functions to be maintained; this is generally intractable for tasks that have a high dimensional state-action space. We derive a variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are perturbed by a learned variational noise distribution. We interpret the successful NoisyNets method <cit.> as an approximation to the variational Thompson sampling method that we derive. Further, we propose State Aware Noisy Exploration (SANE) which seeks to improve on NoisyNets by allowing a non-uniform perturbation, where the amount of parameter perturbation is conditioned on the state of the agent. This is done with the help of an auxiliary perturbation module, whose output is state dependent and is learnt end to end with gradient descent. We hypothesize that such state-aware noisy exploration is particularly useful in problems where exploration in certain high risk states may result in the agent failing badly. We demonstrate the effectiveness of the state-aware exploration method in the off-policy setting by augmenting DQNs with the auxiliary perturbation module.

READ FULL TEXT

page 1

page 5

page 6

page 12

research
06/06/2018

Randomized Value Functions via Multiplicative Normalizing Flows

Randomized value functions offer a promising approach towards the challe...
research
06/14/2022

Stein Variational Goal Generation For Reinforcement Learning in Hard Exploration Problems

Multi-goal Reinforcement Learning has recently attracted a large amount ...
research
11/03/2020

Amortized Variational Deep Q Network

Efficient exploration is one of the most important issues in deep reinfo...
research
07/25/2018

Variational Bayesian Reinforcement Learning with Regret Bounds

We consider the exploration-exploitation trade-off in reinforcement lear...
research
07/01/2015

Bootstrapped Thompson Sampling and Deep Exploration

This technical note presents a new approach to carrying out the kind of ...
research
06/30/2017

Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametr...
research
09/24/2020

Study of autonomous conservative oscillator using an improved perturbation method

In a recent article <cit.>, Aboodh transform based homotopy perturbation...

Please sign up or login with your details

Forgot password? Click here to reset