Anytime PSRO for Two-Player Zero-Sum Games

01/19/2022
by   Stephen McAleer, et al.
1

Policy space response oracles (PSRO) is a multi-agent reinforcement learning algorithm that has achieved state-of-the-art performance in very large two-player zero-sum games. PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next. We propose anytime double oracle (ADO), a tabular double oracle algorithm for 2-player zero-sum games that is guaranteed to converge to a Nash equilibrium while decreasing exploitability from one iteration to the next. Unlike DO, in which the restricted distribution is based on the restricted game formed by each player's strategy sets, ADO finds the restricted distribution for each player that minimizes its exploitability against any policy in the full, unrestricted game. We also propose a method of finding this restricted distribution via a no-regret algorithm updated against best responses, called RM-BR DO. Finally, we propose anytime PSRO (APSRO), a version of ADO that calculates best responses via reinforcement learning. In experiments on Leduc poker and random normal form games, we show that our methods achieve far lower exploitability than DO and PSRO and decrease exploitability monotonically.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

XDO: A Double Oracle Algorithm for Extensive-Form Games

Policy Space Response Oracles (PSRO) is a deep reinforcement learning al...
research
07/04/2020

Off-Policy Exploitability-Evaluation and Equilibrium-Learning in Two-Player Zero-Sum Markov Games

Off-policy evaluation (OPE) is the problem of evaluating new policies us...
research
03/13/2021

Online Double Oracle

Solving strategic games with huge action space is a critical yet under-e...
research
02/27/2020

Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games

Zero-sum games have long guided artificial intelligence research, since ...
research
06/02/2019

Feature-Based Q-Learning for Two-Player Stochastic Games

Consider a two-player zero-sum stochastic game where the transition func...
research
02/09/2023

Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis

In iterative approaches to empirical game-theoretic analysis (EGTA), the...
research
09/25/2020

Double Oracle Algorithm for Computing Equilibria in Continuous Games

Many efficient algorithms have been designed to recover Nash equilibria ...

Please sign up or login with your details

Forgot password? Click here to reset