Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games

04/23/2018
by   Vojtěch Kovařík, et al.
0

Hannan consistency, or no external regret, is a key concept for learning in games. An action selection algorithm is Hannan consistent (HC), if its performance is eventually as good as selecting the best fixed action in hindsight. If both players in a zero-sum normal form game use a Hannan consistent algorithm, their average behavior converges to a Nash equilibrium of the game. A similar result is known about extensive form games, but the played strategies need to be Hannan consistent with respect to counterfactual values, which are often difficult to obtain. We study zero-sum extensive form games with simultaneous moves, but otherwise perfect information. These games generalize normal form games and they are a special case of extensive form games. We study whether applying HC algorithms in each decision point of a these games directly to the observed payoffs leads to convergence to a Nash equilibrium. This learning process corresponds to a class of Monte Carlo Tree Search algorithms, which are popular for playing simultaneous move games, but do not have any known performance guarantees. We show that using HC algorithms directly on the observed payoffs is not sufficient to guarantee the convergence. With an additional averaging over joint actions, the convergence is guaranteed, but empirically slower. We further define an additional property of HC algorithms, which is sufficient to guarantee the convergence without the averaging and we empirically show that commonly used HC algorithms have this property.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2017

Regret Minimization in Behaviorally-Constrained Zero-Sum Games

No-regret learning has emerged as a powerful tool for solving extensive-...
research
06/27/2021

Last-iterate Convergence in Extensive-Form Games

Regret-based algorithms are highly efficient at finding approximate Nash...
research
07/18/2022

Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

The study of learning in games has thus far focused primarily on normal ...
research
03/22/2019

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Researchers on artificial intelligence have achieved human-level intelli...
research
03/22/2019

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games

Researchers on artificial intelligence have achieved human-level intelli...
research
09/21/2020

Optimal Targeting in Super-Modular Games

We study an optimal targeting problem for super-modular games with binar...

Please sign up or login with your details

Forgot password? Click here to reset