Thompson Sampling for Bandits with Clustered Arms

09/06/2021
by   Emil Carlsson, et al.
5

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2019

Multi-Armed Bandits with Correlated Arms

We consider a multi-armed bandit framework where the rewards obtained by...
research
02/19/2020

Warm Starting Bandits with Side Information from Confounded Data

We study a variant of the multi-armed bandit problem where side informat...
research
05/15/2018

Graph Signal Sampling via Reinforcement Learning

We formulate the problem of sampling and recovering clustered graph sign...
research
07/04/2022

Autonomous Drug Design with Multi-armed Bandits

Recent developments in artificial intelligence and automation could pote...
research
05/04/2020

Categorized Bandits

We introduce a new stochastic multi-armed bandit setting where arms are ...
research
11/03/2020

Multi-armed Bandits with Cost Subsidy

In this paper, we consider a novel variant of the multi-armed bandit (MA...
research
09/09/2021

Extreme Bandits using Robust Statistics

We consider a multi-armed bandit problem motivated by situations where o...

Please sign up or login with your details

Forgot password? Click here to reset