Multi-Agent Fully Decentralized Value Function Learning with Linear Convergence Rates

10/17/2018
by   Lucas Cassano, et al.
0

This work develops a fully decentralized multi-agent algorithm for policy evaluation. Our proposed scheme can be applied to two distinct scenarios. In the first one, a collection of agents have distinct datasets gathered following different behavior policies (none of which is required to explore the full state space) in different instances of the same environment and they all collaborate to evaluate a common target policy. The network approach allows for efficient exploration of the state space and allows all agents to converge to the optimal solution even in situations where neither agent can converge on its own without cooperation. The second scenario we consider is that of multi-agent games, in which the state is global and rewards are local. In this scenario agents collaborate to estimate the value function of a target team policy. Our proposed algorithm combines off-policy learning, eligibility traces and linear function approximation. The proposed algorithm is of the variance reduced kind and achieves linear convergence with O(1) memory requirements. We provide a theorem which guarantees the linear convergence of our algorithm and show simulations to illustrate the effectiveness of our method.

READ FULL TEXT
research
10/17/2018

Multi-Agent Fully Decentralized Off-Policy Learning with Linear Convergence Rates

In this paper we develop a fully decentralized algorithm for policy eval...
research
02/08/2023

Policy Evaluation in Decentralized POMDPs with Belief Sharing

Most works on multi-agent reinforcement learning focus on scenarios wher...
research
04/19/2023

Graph Exploration for Effective Multi-agent Q-Learning

This paper proposes an exploration technique for multi-agent reinforceme...
research
12/30/2013

Distributed Policy Evaluation Under Multiple Behavior Strategies

We apply diffusion strategies to develop a fully-distributed cooperative...
research
06/18/2020

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

In this paper we propose novel distributed gradient-based temporal diffe...
research
05/29/2018

Learning Under Distributed Features

This work studies the problem of learning under both large data and larg...
research
02/27/2023

Safe Multi-agent Learning via Trapping Regions

One of the main challenges of multi-agent learning lies in establishing ...

Please sign up or login with your details

Forgot password? Click here to reset