Multi-agent Natural Actor-critic Reinforcement Learning Algorithms

by   Prashant Trivedi, et al.

Both single-agent and multi-agent actor-critic algorithms are an important class of Reinforcement Learning algorithms. In this work, we propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms. The agents' objective is to collectively learn a joint policy that maximizes the sum of averaged long-term returns of these agents. In the absence of a central controller, agents communicate the information to their neighbors via a time-varying communication network while preserving privacy. We prove the convergence of all the 3 MAN algorithms to a globally asymptotically stable point of the ODE corresponding to the actor update; these use linear function approximations. We use the Fisher information matrix to obtain the natural gradients. The Fisher information matrix captures the curvature of the Kullback-Leibler (KL) divergence between polices at successive iterates. We also show that the gradient of this KL divergence between policies of successive iterates is proportional to the objective function's gradient. Our MAN algorithms indeed use this representation of the objective function's gradient. Under certain conditions on the Fisher information matrix, we prove that at each iterate, the optimal value via MAN algorithms can be better than that of the multi-agent actor-critic (MAAC) algorithm using the standard gradients. To validate the usefulness of our proposed algorithms, we implement all the 3 MAN algorithms on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25 congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic 15 agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm. We attribute the better performance of the MAN algorithms to their use of the above representation.


page 1

page 2

page 3

page 4


A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper extends off-policy reinforcement learning to the multi-agent ...

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

We consider the problem of fully decentralized multi-agent reinforcement...

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Multi-agent reinforcement learning (MARL) has attracted much research at...

Cooperative Actor-Critic via TD Error Aggregation

In decentralized cooperative multi-agent reinforcement learning, agents ...

Multi-Agent Congestion Cost Minimization With Linear Function Approximations

This work considers multiple agents traversing a network from a source n...

Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms

We present sufficient conditions that ensure convergence of the multi-ag...

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Actor-critic (AC) algorithms have been widely adopted in decentralized m...

Please sign up or login with your details

Forgot password? Click here to reset