Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

03/16/2023
by   Xutong Zhao, et al.
0

Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this paper, we propose an exploration method that efficiently encourages cooperative exploration based on the idea of the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees). The high-level intuition is that to perform optimism-based exploration, agents would achieve cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. At each node (i.e., action) of the search tree, UCT performs optimism-based exploration using a bonus derived by conditioning on the visitation count of its parent node. We provide a perspective to view MARL as tree search iterations and develop a method called Conditionally Optimistic Exploration (COE). We assume agents take actions following a sequential order, and consider nodes at the same depth of the search tree as actions of one individual agent. COE computes each agent's state-action value estimate with an optimistic bonus derived from the visitation count of the state and joint actions taken by agents up to the current agent. COE is adaptable to any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

This paper focuses on cooperative value-based multi-agent reinforcement ...
research
12/20/2022

Adapting the Exploration Rate for Value-of-Information-Based Reinforcement Learning

In this paper, we consider the problem of adjusting the exploration rate...
research
12/27/2022

Strangeness-driven Exploration in Multi-Agent Reinforcement Learning

Efficient exploration strategy is one of essential issues in cooperative...
research
10/16/2019

MAVEN: Multi-Agent Variational Exploration

Centralised training with decentralised execution is an important settin...
research
03/22/2018

DOP: Deep Optimistic Planning with Approximate Value Function Evaluation

Research on reinforcement learning has demonstrated promising results in...
research
03/13/2023

Fast exploration and learning of latent graphs with aliased observations

Consider this scenario: an agent navigates a latent graph by performing ...
research
03/03/2023

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

The multi-agent setting is intricate and unpredictable since the behavio...

Please sign up or login with your details

Forgot password? Click here to reset