Plug and Play, Model-Based Reinforcement Learning

08/20/2021
by   Majid Abdolshah, et al.
9

Sample-efficient generalisation of reinforcement learning approaches have always been a challenge, especially, for complex scenes with many components. In this work, we introduce Plug and Play Markov Decision Processes, an object-based representation that allows zero-shot integration of new objects from known object classes. This is achieved by representing the global transition dynamics as a union of local transition functions, each with respect to one active object in the scene. Transition dynamics from an object class can be pre-learnt and thus would be ready to use in a new environment. Each active object is also endowed with its reward function. Since there is no central reward function, addition or removal of objects can be handled efficiently by only updating the reward functions of objects involved. A new transfer learning mechanism is also proposed to adapt reward function in such cases. Experiments show that our representation can achieve sample-efficiency in a variety of set-ups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2016

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

Inverse Reinforcement Learning (IRL) describes the problem of learning a...
research
08/28/2023

On Reward Structures of Markov Decision Processes

A Markov decision process can be parameterized by a transition kernel an...
research
02/16/2021

Inverse Reinforcement Learning in the Continuous Setting with Formal Guarantees

Inverse Reinforcement Learning (IRL) is the problem of finding a reward ...
research
01/26/2020

Constrained Upper Confidence Reinforcement Learning

Constrained Markov Decision Processes are a class of stochastic decision...
research
01/01/2022

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

Reinforcement learning (RL) has drawn increasing interests in recent yea...
research
12/11/2021

Control-Tutored Reinforcement Learning: Towards the Integration of Data-Driven and Model-Based Control

We present an architecture where a feedback controller derived on an app...
research
08/20/2022

Calculus on MDPs: Potential Shaping as a Gradient

In reinforcement learning, different reward functions can be equivalent ...

Please sign up or login with your details

Forgot password? Click here to reset