Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors

03/06/2023
by   Björn Lindenberg, et al.
0

We consider Thompson sampling for linear bandit problems with finitely many independent arms, where rewards are sampled from normal distributions that are linearly dependent on unknown parameter vectors and with unknown variance. Specifically, with a Bayesian formulation we consider multivariate normal-gamma priors to represent environment uncertainty for all involved parameters. We show that our chosen sampling prior is a conjugate prior to the reward model and derive a Bayesian regret bound for Thompson sampling under the condition that the 5/2-moment of the variance distribution exist.

READ FULL TEXT
research
04/21/2013

Prior-free and prior-dependent regret bounds for Thompson Sampling

We consider the stochastic multi-armed bandit problem with a prior distr...
research
02/07/2018

Gradient conjugate priors and deep neural networks

The paper deals with learning the probability distribution of the observ...
research
07/13/2021

No Regrets for Learning the Prior in Bandits

We propose AdaTS, a Thompson sampling algorithm that adapts sequentially...
research
03/27/2023

Prior Elicitation for Generalised Linear Models and Extensions

A statistical method for the elicitation of priors in Bayesian generalis...
research
03/07/2021

CORe: Capitalizing On Rewards in Bandit Exploration

We propose a bandit algorithm that explores purely by randomizing its pa...
research
09/12/2023

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

Thompson sampling (TS) is one of the most popular and earliest algorithm...
research
07/05/2015

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

We consider the correlated multiarmed bandit (MAB) problem in which the ...

Please sign up or login with your details

Forgot password? Click here to reset