Multi-Agent Reinforcement Learning with Reward Delays

12/02/2022
∙
by   Yuyang Zhang, et al.
∙
0
∙

This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies among agents. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate 𝒊Ėƒ(H^3√(Sð’Ŋ_K)/K+H^3√(SA)/√(K)) where K is the number of episodes, H is the planning horizon, S is the size of the state space, A is the size of the largest action space, and ð’Ŋ_K is the measure of the total delay defined in the paper. Moreover, our algorithm can be extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro