Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach

02/11/2018
by   Yang Cao, et al.
0

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. In this paper, we consider a scenario in which the arms' reward distributions may change in a piecewise-stationary fashion at unknown time steps. By connecting change-detection techniques with classic UCB algorithms, we motivate and propose a learning algorithm called M-UCB, which can detect and adapt to changes, for the considered scenario. We also establish an O(√(MKT T)) regret bound for M-UCB, where T is the number of time steps, K is the number of arms, and M is the number of stationary segments. the gap between the expected rewards of the optimal and best suboptimal arms. Comparison with the best available lower bound shows that M-UCB is nearly optimal in T up to a logarithmic factor. We also compare M-UCB with state-of-the-art algorithms in a numerical experiment based on a public Yahoo! dataset. In this experiment, M-UCB achieves about 50 % regret reduction with respect to the best performing state-of-the-art algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2019

A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits

We investigate the piecewise-stationary combinatorial semi-bandit proble...
research
06/09/2023

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

We study a structured multi-agent multi-armed bandit (MAMAB) problem in ...
research
11/08/2017

A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem

The multi-armed bandit problem has been extensively studied under the st...
research
05/18/2023

Discounted Thompson Sampling for Non-Stationary Bandit Problems

Non-stationary multi-armed bandit (NS-MAB) problems have recently receiv...
research
09/12/2019

Be Aware of Non-Stationarity: Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits

Cascading bandit (CB) is a variant of both the multi-armed bandit (MAB) ...
research
05/27/2022

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

In this paper, we consider the setting of piecewise i.i.d. bandits under...
research
12/29/2021

Socially-Optimal Mechanism Design for Incentivized Online Learning

Multi-arm bandit (MAB) is a classic online learning framework that studi...

Please sign up or login with your details

Forgot password? Click here to reset