MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

09/02/2021
by   Eshagh Kargar, et al.
0

This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as QMIX and MADDPG and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2017

Counterfactual Multi-Agent Policy Gradients

Cooperative multi-agent systems can be naturally used to model many real...
research
06/27/2021

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Recent works have applied the Proximal Policy Optimization (PPO) to the ...
research
12/03/2019

BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

In this work, we propose a novel memory-based multi-agent meta-learning ...
research
08/23/2023

E(3)-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Identification and analysis of symmetrical patterns in the natural world...
research
04/02/2020

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) under partial observability ha...
research
12/21/2017

A Deep Policy Inference Q-Network for Multi-Agent Systems

We present DPIQN, a deep policy inference Q-network that targets multi-a...
research
10/26/2021

Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization

Self-Driven Particles (SDP) describe a category of multi-agent systems c...

Please sign up or login with your details

Forgot password? Click here to reset