Decentralized Multi-Agent Linear Bandits with Safety Constraints

12/01/2020
∙
by   Sanae Amani, et al.
∙
0
∙

We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to efficiently solve a linear bandit-optimization problem over a d-dimensional space. For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. At each round of the algorithm each agent chooses its actions following an upper confidence bound (UCB) strategy and agents share information with their immediate neighbors through a carefully designed consensus procedure that repeats over cycles. Our analysis adjusts the duration of these communication cycles ensuring near-optimal regret performance 𝒪(dlogNT√(NT)) at a communication rate of 𝒪(dN^2) per round. The structure of the network affects the regret performance via a small additive term - coined the regret of delay - that depends on the spectral gap of the underlying graph. Notably, our results apply to arbitrary network topologies without a requirement for a dedicated agent acting as a server. In consideration of situations with high communication cost, we propose RC-DLUCB: a modification of DLUCB with rare communication among agents. The new algorithm trades off regret performance for a significantly reduced total communication cost of 𝒪(d^3N^2.5) over all T rounds. Finally, we show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits. For the recently studied problem of linear bandits with unknown linear safety constraints, we propose the first safe decentralized algorithm. Our study contributes towards applying bandit techniques in safety-critical distributed systems that repeatedly deal with unknown stochastic environments. We present numerical simulations for various network topologies that corroborate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 05/26/2022

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

We study distributed contextual linear bandits with stochastic contexts,...
research
∙ 05/12/2022

Collaborative Multi-agent Stochastic Linear Bandits

We study a collaborative multi-agent stochastic linear bandit setting, w...
research
∙ 06/08/2021

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

We study the problem of stochastic bandits with adversarial corruptions ...
research
∙ 02/10/2021

Multi-Agent Multi-Armed Bandits with Limited Communication

We consider the problem where N agents collaboratively interact with an ...
research
∙ 07/02/2020

Multi-Agent Low-Dimensional Linear Bandits

We study a multi-agent stochastic linear bandit with side information, p...
research
∙ 06/01/2022

An α-No-Regret Algorithm For Graphical Bilinear Bandits

We propose the first regret-based approach to the Graphical Bilinear Ban...
research
∙ 10/12/2022

Near-Optimal Multi-Agent Learning for Safe Coverage Control

In multi-agent coverage control problems, agents navigate their environm...

Please sign up or login with your details

Forgot password? Click here to reset