Stochastic Bandits with Linear Constraints

06/17/2020
by   Aldo Pacchiano, et al.
0

We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of T rounds is maximum, and each has an expected cost below a certain threshold τ. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an O(d√(T)/τ-c_0) bound on its T-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action. We further specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting. We prove a regret bound of O(√(KT)/τ - c_0) for this algorithm in K-armed bandits, which is a √(K) improvement over the regret bound we obtain by simply casting multi-armed bandits as an instance of contextual linear bandits and using the regret bound of OPLB. We also prove a lower-bound for the problem studied in the paper and provide simulations to validate our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2019

Batched Multi-Armed Bandits with Optimal Regret

We present a simple and efficient algorithm for the batched stochastic m...
research
04/20/2020

Thompson Sampling for Linearly Constrained Bandits

We address multi-armed bandits (MAB) where the objective is to maximize ...
research
05/22/2018

Cost-aware Cascading Bandits

In this paper, we propose a cost-aware cascading bandits model, a new va...
research
03/29/2022

On Kernelized Multi-Armed Bandits with Constraints

We study a stochastic bandit problem with a general unknown reward funct...
research
02/10/2021

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits

This paper considers stochastic linear bandits with general constraints....
research
02/01/2020

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

We consider an adversarial variant of the classic K-armed linear context...
research
10/27/2021

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization

Despite the significant interests and many progresses in decentralized m...

Please sign up or login with your details

Forgot password? Click here to reset