Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations

04/10/2022
by   Hongju Park, et al.
0

Contextual bandits are canonical models for sequential decision-making under uncertainty in environments with time-varying components. In this setting, the expected reward of each bandit arm consists of the inner product of an unknown parameter and the context vector of that arm, perturbed with a random error. The classical setting heavily relies on fully observed contexts, while study of the richer model of imperfectly observed contextual bandits is immature. This work considers Greedy reinforcement learning policies that take actions as if the current estimates of the parameter and of the unobserved contexts coincide with the corresponding true values. We establish that the non-asymptotic worst-case regret grows logarithmically with the time horizon and the failure probability, while it scales linearly with the number of arms. Numerical analysis showcasing the above efficiency of Greedy policies is also provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2022

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

In this paper, we address the stochastic contextual linear bandit proble...
research
02/02/2022

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based contro...
research
02/09/2021

Robust Bandit Learning with Imperfect Context

A standard assumption in contextual multi-arm bandit is that the true co...
research
02/26/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Bandit learning algorithms typically involve the balance of exploration ...
research
02/16/2023

Quality vs. Quantity of Data in Contextual Decision-Making: Exact Analysis under Newsvendor Loss

When building datasets, one needs to invest time, money and energy to ei...
research
07/16/2020

Self-Tuning Bandits over Unknown Covariate-Shifts

Bandits with covariates, a.k.a. contextual bandits, address situations w...
research
02/12/2018

Policy Gradients for Contextual Bandits

We study a generalized contextual-bandits problem, where there is a stat...

Please sign up or login with your details

Forgot password? Click here to reset