Contextual Bandits with Knapsacks for a Conversion Model

06/01/2022
by   Zhen Li, et al.
0

We consider contextual bandits with knapsacks, with an underlying structure between rewards generated and cost vectors suffered. We do so motivated by sales with commercial discounts. At each round, given the stochastic i.i.d.context 𝐱_t and the arm picked a_t (corresponding, e.g., to a discount level), a customer conversion may be obtained, in which case a reward r(a,𝐱_t) is gained and vector costs c(a_t,𝐱_t) are suffered (corresponding, e.g., to losses of earnings). Otherwise, in the absence of a conversion, the reward and costs are null. The reward and costs achieved are thus coupled through the binary variable measuring conversion or the absence thereof. This underlying structure between rewards and costs is different from the linear structures considered by Agrawal and Devanur [2016] but we show that the techniques introduced in this article may also be applied to the latter case. Namely, the adaptive policies exhibited solve at each round a linear program based on upper-confidence estimates of the probabilities of conversion given a and 𝐱. This kind of policy is most natural and achieves a regret bound of the typical order (OPT/B) √(T), where B is the total budget allowed, OPT is the optimal expected reward achievable by a static policy, and T is the number of rounds.

READ FULL TEXT

page 4

page 6

page 34

page 39

research
05/25/2023

Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness

We consider contextual bandit problems with knapsacks [CBwK], a problem ...
research
01/22/2021

Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret

We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem i...
research
10/21/2022

Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles

We study the stochastic contextual bandit with knapsacks (CBwK) problem,...
research
10/23/2018

Unifying the stochastic and the adversarial Bandits with Knapsack

This paper investigates the adversarial Bandits with Knapsack (BwK) onli...
research
10/25/2021

Linear Contextual Bandits with Adversarial Corruptions

We study the linear contextual bandit problem in the presence of adversa...
research
04/02/2020

Predictive Bandits

We introduce and study a new class of stochastic bandit problems, referr...
research
01/13/2022

Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version)

Motivated by scenarios of information diffusion and advertising in socia...

Please sign up or login with your details

Forgot password? Click here to reset