Near-Optimal No-Regret Learning in General Games

08/16/2021
by   Constantinos Daskalakis, et al.
0

We show that Optimistic Hedge – a common variant of multiplicative-weights-updates with recency bias – attains poly(log T) regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after T rounds of interaction, each player experiences total regret that is poly(log T). Our bound improves, exponentially, the O(T^1/2) regret attainable by standard no-regret learners in games, the O(T^1/4) regret attainable by no-regret learners with recency bias (Syrgkanis et al., 2015), and the O(T^1/6) bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of Õ(1/T).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset