PLOTS: Procedure Learning from Observations using Subtask Structure

04/17/2019
by   Tong Mu, et al.
0

In many cases an intelligent agent may want to learn how to mimic a single observed demonstrated trajectory. In this work we consider how to perform such procedural learning from observation, which could help to enable agents to better use the enormous set of video data on observation sequences. Our approach exploits the properties of this setting to incrementally build an open loop action plan that can yield the desired subsequence, and can be used in both Markov and partially observable Markov domains. In addition, procedures commonly involve repeated extended temporal action subsequences. Our method optimistically explores actions to leverage potential repeated structure in the procedure. In comparing to some state-of-the-art approaches we find that our explicit procedural learning from observation method is about 100 times faster than policy-gradient based approaches that learn a stochastic policy and is faster than model based approaches as well. We also find that performing optimistic action selection yields substantial speed ups when latent dynamical structure is present.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2018

Directed Policy Gradient for Safe Reinforcement Learning with Human Advice

Many currently deployed Reinforcement Learning agents work in an environ...
research
06/18/2021

Learning to Plan via a Multi-Step Policy Regression Method

We propose a new approach to increase inference performance in environme...
research
01/15/2014

Learning Partially Observable Deterministic Action Models

We present exact algorithms for identifying deterministic-actions effect...
research
05/03/2019

Information asymmetry in KL-regularized RL

Many real world tasks exhibit rich structure that is repeated across dif...
research
06/03/2011

Infinite-Horizon Policy-Gradient Estimation

Gradient-based approaches to direct policy search in reinforcement learn...
research
03/13/2023

Fast exploration and learning of latent graphs with aliased observations

Consider this scenario: an agent navigates a latent graph by performing ...

Please sign up or login with your details

Forgot password? Click here to reset