TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

05/22/2023
by   Yecheng Jason Ma, et al.
0

Standard model-based reinforcement learning (MBRL) approaches fit a transition model of the environment to all past experience, but this wastes model capacity on data that is irrelevant for policy improvement. We instead propose a new "transition occupancy matching" (TOM) objective for MBRL model learning: a model is good to the extent that the current policy experiences the same distribution of transitions inside the model as in the real environment. We derive TOM directly from a novel lower bound on the standard reinforcement learning objective. To optimize TOM, we show how to reduce it to a form of importance weighted maximum-likelihood estimation, where the automatically computed importance weights identify policy-relevant past experiences from a replay buffer, enabling stable optimization. TOM thus offers a plug-and-play model learning sub-routine that is compatible with any backbone MBRL algorithm. On various Mujoco continuous robotic control tasks, we show that TOM successfully focuses model learning on policy-relevant experience and drives policies faster to higher task rewards than alternative model learning approaches.

READ FULL TEXT

page 2

page 8

research
06/19/2019

Experience Replay Optimization

Experience replay enables reinforcement learning agents to memorize and ...
research
09/09/2019

Gradient-Aware Model-based Policy Search

Traditional model-based reinforcement learning approaches learn a model ...
research
04/04/2022

Value Gradient weighted Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is a sample efficient techniqu...
research
11/03/2021

Model-Based Episodic Memory Induces Dynamic Hybrid Controls

Episodic control enables sample efficiency in reinforcement learning by ...
research
11/27/2018

Prioritizing Starting States for Reinforcement Learning

Online, off-policy reinforcement learning algorithms are able to use an ...
research
06/26/2021

Model-Advantage Optimization for Model-Based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) algorithms have been tradition...
research
02/07/2021

Model-Augmented Q-learning

In recent years, Q-learning has become indispensable for model-free rein...

Please sign up or login with your details

Forgot password? Click here to reset