Bootstrapping Skills

06/11/2015
by   Daniel J. Mankowitz, et al.
0

The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions. For the monolithic approach to succeed (and this is not always possible), a complex feature representation is often necessary since the policy is a complex object that has to prescribe what actions to take all over the state space. This is especially true in large domains with complicated dynamics. It is also computationally inefficient to both learn and plan in MDPs using a complex monolithic approach. We present a different approach where we restrict the policy space to policies that can be represented as combinations of simpler, parameterized skills---a type of temporally extended action, with a simple policy representation. We introduce Learning Skills via Bootstrapping (LSB) that can use a broad family of Reinforcement Learning (RL) algorithms as a "black box" to iteratively learn parametrized skills. Initially, the learned skills are short-sighted but each iteration of the algorithm allows the skills to bootstrap off one another, improving each skill in the process. We prove that this bootstrapping process returns a near-optimal policy. Furthermore, our experiments demonstrate that LSB can solve MDPs that, given the same representational power, could not be solved by a monolithic approach. Thus, planning with learned skills results in better policies without requiring complex policy representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2011

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Dec...
research
01/27/2022

Rethinking Learning Dynamics in RL using Adversarial Networks

We present a learning mechanism for reinforcement learning of closely re...
research
11/20/2018

Model Learning for Look-ahead Exploration in Continuous Control

We propose an exploration method that incorporates look-ahead search ove...
research
10/10/2016

Situational Awareness by Risk-Conscious Skills

Hierarchical Reinforcement Learning has been previously shown to speed u...
research
06/07/2022

Meta-Learning Transferable Parameterized Skills

We propose a novel parameterized skill-learning algorithm that aims to l...
research
02/10/2016

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

For complex, high-dimensional Markov Decision Processes (MDPs), it may b...
research
06/10/2021

Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning

Today's advanced Reinforcement Learning algorithms produce black-box pol...

Please sign up or login with your details

Forgot password? Click here to reset