Constant regret for sequence prediction with limited advice

by   El Mehdi Saad, et al.

We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O(√($) KT). We show that allowing the learner to play one additional expert per round and observe one additional feedback improves substantially the guarantees on regret. We provide a strategy combining only p = 2 experts per round for prediction and observing m≥2 experts' losses. Its randomized regret (wrt. internal randomization of the learners' strategy) is of order O (K/m) log(Kδ–1) with probability 1 –δ, i.e., is independent of the horizon T ("constant" or "fast rate" regret) if (p≥2 and m≥3). We prove that this rate is optimal up to a logarithmic factor in K. In the case p = m = 2, we provide an upper bound of order O(K 2 log(Kδ–1)), with probability 1 –δ. Our strategies do not require any prior knowledge of the horizon T nor of the confidence parameterδ. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate"Ω(√($) KT), suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.


page 1

page 2

page 3

page 4


Fast rates for prediction with limited expert advice

We investigate the problem of minimizing the excess generalization error...

Fast Rates for Online Prediction with Abstention

In the setting of sequential prediction of individual {0, 1}-sequences w...

Active Ranking of Experts Based on their Performances in Many Tasks

We consider the problem of ranking n experts based on their performances...

Nonstochastic Bandits with Infinitely Many Experts

We study the problem of nonstochastic bandits with infinitely many exper...

Online non-convex optimization with imperfect feedback

We consider the problem of online learning with non-convex losses. In te...

Optimal Tracking in Prediction with Expert Advice

We study the prediction with expert advice setting, where the aim is to ...

Sequential prediction under log-loss and misspecification

We consider the question of sequential prediction under the log-loss in ...

Please sign up or login with your details

Forgot password? Click here to reset