On the Minimax Regret for Online Learning with Feedback Graphs

by   Khaled Eldowa, et al.

In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is π’ͺ(√(Ξ± Tln K)), where K is the number of actions, Ξ± is the independence number of the graph, and T is the time horizon. The √(ln K) factor is known to be necessary when Ξ± = 1 (the experts case). On the other hand, when Ξ± = K (the bandits case), the minimax rate is known to be Θ(√(KT)), and a lower bound Ξ©(√(Ξ± T)) is known to hold for any Ξ±. Our improved upper bound π’ͺ(√(Ξ± T(1+ln(K/Ξ±)))) holds for any Ξ± and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q ∈ [1/2, 1) that varies with Ξ±. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time-varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved Ξ©(√(Ξ± T(ln K)/(lnΞ±))) lower bound for all Ξ± > 1, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as Ξ± < K.


page 1

page 2

page 3

page 4

βˆ™ 03/30/2019

Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

We study the linear contextual bandit problem with finite action sets. W...
βˆ™ 05/06/2023

An improved regret analysis for UCB-N and TS-N

In the setting of stochastic online learning with undirected feedback gr...
βˆ™ 07/11/2020

Tighter Bounds on the Independence Number of the Birkhoff Graph

The Birkhoff graph ℬ_n is the Cayley graph of the symmetric group S_n, w...
βˆ™ 02/09/2021

Nonstochastic Bandits with Infinitely Many Experts

We study the problem of nonstochastic bandits with infinitely many exper...
βˆ™ 04/01/2018

Online learning with graph-structured feedback against adaptive adversaries

We derive upper and lower bounds for the policy regret of T-round online...
βˆ™ 11/06/2015

Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice

We prove non-asymptotic lower bounds on the expectation of the maximum o...
βˆ™ 06/07/2022

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

We study the decentralized online regularized linear regression algorith...

Please sign up or login with your details

Forgot password? Click here to reset