Accelerating and Improving AlphaZero Using Population Based Training

by   Ti-Rong Wu, et al.

AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparameter configuration requires its own time to train one run, during which it will generate its own self-play records. As a result, multiple runs are usually needed for different hyperparameter configurations. This paper proposes using population based training (PBT) to help tune hyperparameters dynamically and improve strength during training time. Another significant advantage is that this method requires a single run only, while incurring a small additional time cost, since the time for generating self-play records remains unchanged though the time for optimization is increased following the AlphaZero training algorithm. In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength. Specifically, the PBT agent can obtain up to 74 state-of-the-art AlphaZero program using a neural network of a comparable capacity. This is compared to a saturated non-PBT agent, which achieves a win rate of 47


page 1

page 2

page 3

page 4


Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning

Deep learning has achieved impressive results on many problems. However,...

Biasing MCTS with Features for General Games

This paper proposes using a linear function approximator, rather than a ...

Multi-level Training and Bayesian Optimization for Economical Hyperparameter Optimization

Hyperparameters play a critical role in the performances of many machine...

Massively Parallel Hyperparameter Tuning

Modern learning models are characterized by large hyperparameter spaces....

Faster Improvement Rate Population Based Training

The successful training of neural networks typically involves careful an...

Enhanced Self-Organizing Map Solution for the Traveling Salesman Problem

Using an enhanced Self-Organizing Map method, we provided suboptimal sol...

One-Shot Bayes Opt with Probabilistic Population Based Training

Selecting optimal hyperparameters is a key challenge in machine learning...

Please sign up or login with your details

Forgot password? Click here to reset