Taming Non-stationary Bandits: A Bayesian Approach

by   Vishnu Raj, et al.
Indian Institute Of Technology, Madras

We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the probability of picking sub-optimal arms. By increasing the exploitative value of Bayes' samples, we also provide an optimistic version of the algorithm. Extensive empirical analysis is conducted under various scenarios to validate the utility of proposed algorithms. A comparison study with various state-of-the-arm algorithms is also included.


page 1

page 2

page 3

page 4


Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits

We consider the non-stationary multi-armed bandit (MAB) framework and pr...

Maximizing Success Rate of Payment Routing using Non-stationary Bandits

This paper discusses the system architecture design and deployment of no...

Energy Regularized RNNs for Solving Non-Stationary Bandit Problems

We consider a Multi-Armed Bandit problem in which the rewards are non-st...

Bandit-Based Model Selection for Deformable Object Manipulation

We present a novel approach to deformable object manipulation that does ...

An Evaluation on Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation

To speed up online testing, adaptive traffic experimentation through mul...

BAM: Bayes with Adaptive Memory

Online learning via Bayes' theorem allows new data to be continuously in...

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

We consider nonstationary multi-armed bandit problems where the model pa...

Please sign up or login with your details

Forgot password? Click here to reset