Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

06/12/2018
by   Yangchen Pan, et al.
4

Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2023

Understanding the effect of varying amounts of replay per step

Model-based reinforcement learning uses models to plan, where the predic...
research
05/30/2017

Experience Replay Using Transition Sequences

Experience replay is one of the most commonly used approaches to improve...
research
03/29/2022

Topological Experience Replay

State-of-the-art deep Q-learning methods update Q-values using state tra...
research
07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...
research
02/22/2018

Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks

Autonomous robots need to interact with unknown, unstructured and changi...
research
06/05/2018

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

Dyna is an architecture for reinforcement learning agents that interleav...
research
06/24/2019

Optimal Use of Experience in First Person Shooter Environments

Although reinforcement learning has made great strides recently, a conti...

Please sign up or login with your details

Forgot password? Click here to reset