On the Suboptimality of Thompson Sampling in High Dimensions

02/10/2021
by   Raymond Zhang, et al.
0

In this paper we consider Thompson Sampling for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, Thompson Sampling is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions. We also show that including a fixed amount of forced exploration to Thompson Sampling does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice Thompson Sampling indeed can perform very poorly in high dimensions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2023

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

As noted in the works of <cit.>, it has been mentioned that it is an ope...
research
11/01/2017

Minimal Exploration in Structured Stochastic Bandits

This paper introduces and addresses a wide class of stochastic bandit pr...
research
11/08/2020

High-Dimensional Sparse Linear Bandits

Stochastic linear bandits with high-dimensional sparse features are a pr...
research
05/29/2021

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimens...
research
06/26/2023

Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits

This paper is motivated by recent developments in the linear bandit lite...
research
11/08/2020

Asymptotic Convergence of Thompson Sampling

Thompson sampling has been shown to be an effective policy across a vari...
research
05/29/2021

Rejection sampling from shape-constrained distributions in sublinear time

We consider the task of generating exact samples from a target distribut...

Please sign up or login with your details

Forgot password? Click here to reset