Distributed Policy Evaluation Under Multiple Behavior Strategies

12/30/2013
by   Sergio Valcarcel Macua, et al.
0

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2018

Multi-Agent Fully Decentralized Off-Policy Learning with Linear Convergence Rates

In this paper we develop a fully decentralized algorithm for policy eval...
research
10/17/2018

Multi-Agent Fully Decentralized Value Function Learning with Linear Convergence Rates

This work develops a fully decentralized multi-agent algorithm for polic...
research
09/22/2014

Distributed Clustering and Learning Over Networks

Distributed processing over networks relies on in-network processing and...
research
12/03/2021

Distributed Adaptive Learning Under Communication Constraints

This work examines adaptive distributed learning strategies designed to ...
research
05/22/2018

Learning over Multitask Graphs - Part II: Performance Analysis

Part I of this paper formulated a multitask optimization problem where a...
research
01/06/2021

One-shot Policy Elicitation via Semantic Reward Manipulation

Synchronizing expectations and knowledge about the state of the world is...
research
05/29/2018

Learning Under Distributed Features

This work studies the problem of learning under both large data and larg...

Please sign up or login with your details

Forgot password? Click here to reset