Learning Reward Functions for Robotic Manipulation by Observing Humans

11/16/2022
by   Minttu Alakuijala, et al.
1

Observing a human demonstrator manipulate objects provides a rich, scalable and inexpensive source of data for learning robotic policies. However, transferring skills from human videos to a robotic manipulator poses several challenges, not least a difference in action and observation spaces. In this work, we use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. Thanks to the diversity of this training data, the learned reward function sufficiently generalizes to image observations from a previously unseen robot embodiment and environment to provide a meaningful prior for directed exploration in reinforcement learning. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective. By conditioning the function on a goal image, we are able to reuse one model across a variety of tasks. Unlike prior work on leveraging human videos to teach robots, our method, Human Offline Learned Distances (HOLD) requires neither a priori data from the robot environment, nor a set of task-specific human demonstrations, nor a predefined notion of correspondence across morphologies, yet it is able to accelerate training of several manipulation tasks on a simulated robot arm compared to using only a sparse reward obtained from task completion.

READ FULL TEXT

page 1

page 4

page 6

research
03/31/2021

Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos

We are motivated by the goal of generalist robots that can complete a wi...
research
03/02/2023

Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

In imitation and reinforcement learning, the cost of human supervision l...
research
06/19/2023

LARG, Language-based Automatic Reward and Goal Generation

Goal-conditioned and Multi-Task Reinforcement Learning (GCRL and MTRL) a...
research
10/16/2018

Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation

A general-purpose intelligent robot must be able to learn autonomously a...
research
12/10/2019

AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

Robotic reinforcement learning (RL) holds the promise of enabling robots...
research
12/30/2019

Learning Predictive Models From Observation and Interaction

Learning predictive models from interaction with the world allows an age...
research
07/29/2023

PIMbot: Policy and Incentive Manipulation for Multi-Robot Reinforcement Learning in Social Dilemmas

Recent research has demonstrated the potential of reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset