Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning

In cooperative stochastic games multiple agents work towards learning joint optimal actions in an unknown environment to achieve a common goal. In many real-world applications, however, constraints are often imposed on the actions that can be jointly taken by the agents. In such scenarios the agents aim to learn joint actions to achieve a common goal (minimizing a specified cost function) while meeting the given constraints (specified via certain penalty functions). In this paper, we consider the relaxation of the constrained optimization problem by constructing the Lagrangian of the cost and penalty functions. We propose a nested actor-critic solution approach to solve this relaxed problem. In this approach, an actor-critic scheme is employed to improve the policy for a given Lagrange parameter update on a faster timescale as in the classical actor-critic architecture. A meta actor-critic scheme using this faster timescale policy updates is then employed to improve the Lagrange parameters on the slower timescale. Utilizing the proposed nested actor-critic schemes, we develop three Nested Actor-Critic (N-AC) algorithms. Through experiments on constrained cooperative tasks, we show the effectiveness of the proposed algorithms.


page 1

page 2

page 3

page 4


Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

In this work, we consider the problem of computing optimal actions for R...

Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games

Recent success in cooperative multi-agent reinforcement learning (MARL) ...

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

In the field of reinforcement learning, because of the high cost and ris...

Refined Continuous Control of DDPG Actors via Parametrised Activation

In this paper, we propose enhancing actor-critic reinforcement learning ...

Actor Critic with Differentially Private Critic

Reinforcement learning algorithms are known to be sample inefficient, an...

Distributionally-Constrained Policy Optimization via Unbalanced Optimal Transport

We consider constrained policy optimization in Reinforcement Learning, w...

Solving Continuous Control via Q-learning

While there has been substantial success in applying actor-critic method...

Please sign up or login with your details

Forgot password? Click here to reset