On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

10/21/2020
by   Ezra Tampubolon, et al.
0

In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset