Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

by   Dingyang Chen, et al.

We study the performance of policy gradient methods for the subclass of Markov games known as Markov potential games (MPGs), which extends the notion of normal-form potential games to the stateful setting and includes the important special case of the fully cooperative setting where the agents share an identical reward function. Our focus in this paper is to study the convergence of the policy gradient method for solving MPGs under softmax policy parameterization, both tabular and parameterized with general function approximators such as neural networks. We first show the asymptotic convergence of this method to a Nash equilibrium of MPGs for tabular softmax policies. Second, we derive the finite-time performance of the policy gradient in two settings: 1) using the log-barrier regularization, and 2) using the natural policy gradient under the best-response dynamics (NPG-BR). Finally, extending the notion of price of anarchy (POA) and smoothness in normal-form games, we introduce the POA for MPGs and provide a POA bound for NPG-BR. To our knowledge, this is the first POA bound for solving MPGs. To support our theoretical results, we empirically compare the convergence rates and POA of policy gradient variants for both tabular and neural softmax policies.


page 1

page 2

page 3

page 4


On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

Softmax policy gradient is a popular algorithm for policy optimization i...

Independent Natural Policy Gradient Always Converges in Markov Potential Games

Multi-agent reinforcement learning has been successfully applied to full...

Neural Replicator Dynamics

In multiagent learning, agents interact in inherently nonstationary envi...

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Learning in stochastic games is a notoriously difficult problem because,...

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

A major challenge in multi-agent systems is that the system complexity g...

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

The softmax policy gradient (PG) method, which performs gradient ascent ...

An Alternate Policy Gradient Estimator for Softmax Policies

Policy gradient (PG) estimators for softmax policies are ineffective wit...

Please sign up or login with your details

Forgot password? Click here to reset