What model does MuZero learn?

06/01/2023
by   Jinke He, et al.
0

Model-based reinforcement learning has drawn considerable interest in recent years, given its promise to improve sample efficiency. Moreover, when using deep-learned models, it is potentially possible to learn compact models from complex sensor data. However, the effectiveness of these learned models, particularly their capacity to plan, i.e., to improve the current policy, remains unclear. In this work, we study MuZero, a well-known deep model-based reinforcement learning algorithm, and explore how far it achieves its learning objective of a value-equivalent model and how useful the learned models are for policy improvement. Amongst various other insights, we conclude that the model learned by MuZero cannot effectively generalize to evaluate unseen policies, which limits the extent to which we can additionally improve the current policy by planning with the model.

READ FULL TEXT

page 8

page 13

page 15

research
09/12/2022

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

A promising way to improve the sample efficiency of reinforcement learni...
research
06/16/2020

Model Embedding Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) has shown its advantages in sa...
research
12/24/2018

VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control

Recent breakthroughs in Go play and strategic games have witnessed the g...
research
10/12/2019

Regularizing Model-Based Planning with Energy-Based Models

Model-based reinforcement learning could enable sample-efficient learnin...
research
04/05/2022

Model Based Meta Learning of Critics for Policy Gradients

Being able to seamlessly generalize across different tasks is fundamenta...
research
03/01/2023

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

We propose a novel approach to addressing two fundamental challenges in ...
research
03/10/2023

Optimal foraging strategies can be learned and outperform Lévy walks

Lévy walks and other theoretical models of optimal foraging have been su...

Please sign up or login with your details

Forgot password? Click here to reset