Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

08/30/2021
by   Shenao Zhang, et al.
0

In multi-agent reinforcement learning, the behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number (i.e., population size). Every single MG induced by varying population sizes may possess distinct optimal joint strategies and game-specific knowledge, which are modeled independently in modern multi-agent algorithms. In this work, we focus on creating agents that generalize across population-varying MGs. Instead of learning a unimodal policy, each agent learns a policy set that is formed by effective strategies across a variety of games. We propose Meta Representations for Agents (MRA) that explicitly models the game-common and game-specific strategic knowledge. By representing the policy sets with multi-modal latent policies, the common strategic knowledge and diverse strategic modes are discovered with an iterative optimization procedure. We prove that as an approximation to a constrained mutual information maximization objective, the learned policies can reach Nash Equilibrium in every evaluation MG under the assumption of Lipschitz game on a sufficiently large latent space. When deploying it at practical latent models with limited size, fast adaptation can be achieved by leveraging the first-order gradient information. Extensive experiments show the effectiveness of MRA on both training performance and generalization ability in hard and unseen games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2020

Multi-Agent Reinforcement Learning in Cournot Games

In this work, we study the interaction of strategic agents in continuous...
research
10/18/2022

RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning

Despite the recent advancement in multi-agent reinforcement learning (MA...
research
02/07/2023

Population-size-Aware Policy Optimization for Mean-Field Games

In this work, we attempt to bridge the two fields of finite-agent and in...
research
06/17/2020

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

This paper introduces two metrics (cycle-based and memory-based metrics)...
research
07/17/2023

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Gradient-based learning in multi-agent systems is difficult because the ...
research
05/17/2021

To be a fast adaptive learner: using game history to defeat opponents

In many real-world games, such as traders repeatedly bargaining with cus...
research
12/21/2017

A Deep Policy Inference Q-Network for Multi-Agent Systems

We present DPIQN, a deep policy inference Q-network that targets multi-a...

Please sign up or login with your details

Forgot password? Click here to reset