Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

03/23/2023
by   Mohammad Pedramfar, et al.
0

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is Õ(T^2/3 + T^1/3ν) for time horizon T, where ν is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models. The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2023

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

We investigate the problem of unconstrained combinatorial multi-armed ba...
research
10/02/2019

Stochastic Bandits with Delayed Composite Anonymous Feedback

We explore a novel setting of the Multi-Armed Bandit (MAB) problem inspi...
research
12/06/2021

Nonstochastic Bandits with Composite Anonymous Feedback

We investigate a nonstochastic bandit setting in which the loss of an ac...
research
03/09/2019

Linear Bandits with Feature Feedback

This paper explores a new form of the linear bandit problem in which the...
research
05/04/2023

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

We investigate an infinite-horizon average reward Markov Decision Proces...
research
05/30/2023

Delayed Bandits: When Do Intermediate Observations Help?

We study a K-armed bandit with delayed feedback and intermediate observa...
research
12/03/2021

On Submodular Contextual Bandits

We consider the problem of contextual bandits where actions are subsets ...

Please sign up or login with your details

Forgot password? Click here to reset