Multiple Policy Value Monte Carlo Tree Search

05/31/2019
by   Li-Cheng Lan, et al.
4

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2020

Monte Carlo Game Solver

We present a general algorithm to order moves so as to speedup exact gam...
research
01/27/2023

Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Large-scale AI systems that combine search and learning have reached sup...
research
02/14/2021

Costly Features Classification using Monte Carlo Tree Search

We consider the problem of costly feature classification, where we seque...
research
12/10/2015

Convolutional Monte Carlo Rollouts in Go

In this work, we present a MCTS-based Go-playing program which uses conv...
research
05/14/2019

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

In recent years, state-of-the-art game-playing agents often involve poli...
research
01/17/2022

NSGZero: Efficiently Learning Non-Exploitable Policy in Large-Scale Network Security Games with Neural Monte Carlo Tree Search

How resources are deployed to secure critical targets in networks can be...
research
08/23/2020

Mobile Networks for Computer Go

The architecture of the neural networks used in Deep Reinforcement Learn...

Please sign up or login with your details

Forgot password? Click here to reset