Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

by   Songtao Feng, et al.

The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an ϵ-optimal Nash Equilibrium (NE) with the sample complexity of O(H^3SAB/ϵ^2), which is optimal in the dependence of the horizon H and the number of states S (where A and B denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the H dependence as model-based algorithms. The main improvement of the dependency on H arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.


page 1

page 2

page 3

page 4


A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Model-based algorithms—algorithms that decouple learning of the model an...

Towards General Function Approximation in Zero-Sum Markov Games

This paper considers two-player zero-sum finite-horizon Markov games wit...

Learning to Coordinate Efficiently: A Model-based Approach

In common-interest stochastic games all players receive an identical pay...

Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity

Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control ...

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Model-based reinforcement learning (RL), which finds an optimal policy u...

Representation Learning for General-sum Low-rank Markov Games

We study multi-agent general-sum Markov games with nonlinear function ap...

Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

This paper makes progress towards learning Nash equilibria in two-player...

Please sign up or login with your details

Forgot password? Click here to reset