Learning to Model Opponent Learning

by   Ian Davies, et al.

Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.


page 1

page 2

page 3

page 4


Agent Probing Interaction Policies

Reinforcement learning in a multi agent system is difficult because thes...

Interaction-Aware Multi-Agent Reinforcement Learning for Mobile Agents with Individual Goals

In a multi-agent setting, the optimal policy of a single agent is largel...

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn...

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Before taking actions in an environment with more than one intelligent a...

EMOTE: An Explainable architecture for Modelling the Other Through Empathy

We can usually assume others have goals analogous to our own. This assum...

Discovering How Agents Learn Using Few Data

Decentralized learning algorithms are an essential tool for designing mu...

Learning with Modular Representations for Long-Term Multi-Agent Motion Predictions

Context plays a significant role in the generation of motion for dynamic...

Please sign up or login with your details

Forgot password? Click here to reset