Distributed Dynamic Programming forNetworked Multi-Agent Markov Decision Processes
The main goal of this paper is to investigate distributed dynamic programming (DP) to solve networked multi-agent Markov decision problems (MDPs). We consider a distributed multi-agent case, where each agent does not have an access to the rewards of other agents except for its own reward. Moreover, each agent can share their parameters with its neighbors over a communication network represented by a graph. We propose a distributed DP in the continuous-time domain, and prove its convergence through control theoretic viewpoints. The proposed analysis can be viewed as a preliminary ordinary differential equation (ODE) analysis of a distributed temporal difference learning algorithm, whose convergence can be proved using Borkar-Meyn theorem and the single time-scale approach.
READ FULL TEXT