Rollout Sampling Policy Iteration for Decentralized POMDPs

03/15/2012
by   Feng Wu, et al.
0

We present decentralized rollout sampling policy iteration (DecRSPI) - a new algorithm for multi-agent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte- Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2019

Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies

Decision making in multi-agent systems (MAS) is a great challenge due to...
research
03/24/2021

Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with Near-Optimal Sample Complexity and Communication Complexity

The finite-time convergence of off-policy TD learning has been comprehen...
research
03/20/2013

A Monte-Carlo Algorithm for Dempster-Shafer Belief

A very computationally-efficient Monte-Carlo algorithm for the calculati...
research
01/15/2014

Policy Iteration for Decentralized Control of Markov Decision Processes

Coordination of distributed agents is required for problems arising in m...
research
10/17/2018

Multi-Agent Fully Decentralized Off-Policy Learning with Linear Convergence Rates

In this paper we develop a fully decentralized algorithm for policy eval...
research
01/15/2014

Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Partially observable Markov decision processes (POMDPs) provide a princi...
research
05/01/2015

Stick-Breaking Policy Learning in Dec-POMDPs

Expectation maximization (EM) has recently been shown to be an efficient...

Please sign up or login with your details

Forgot password? Click here to reset