Affordances from Human Videos as a Versatile Representation for Robotics

by   Shikhar Bahl, et al.

Building a robot that can understand and learn to interact by watching humans has inspired several vision problems. However, despite some successful results on static datasets, it remains unclear how current models can be used on a robot directly. In this paper, we aim to bridge this gap by leveraging videos of human interactions in an environment centric manner. Utilizing internet videos of human behavior, we train a visual affordance model that estimates where and how in the scene a human is likely to interact. The structure of these behavioral affordances directly enables the robot to perform many complex tasks. We show how to seamlessly integrate our affordance model with four robot learning paradigms including offline imitation learning, exploration, goal-conditioned learning, and action parameterization for reinforcement learning. We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild. Results, visualizations and videos at


page 1

page 2

page 4

page 6

page 8

page 16

page 17

page 18


Structured World Models from Human Videos

We tackle the problem of learning complex, general behaviors directly in...

VideoDex: Learning Dexterity from Internet Videos

To build general robotic agents that can operate in many environments, i...

Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos

We are motivated by the goal of generalist robots that can complete a wi...

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Pretraining on noisy, internet-scale datasets has been heavily studied a...

ALAN: Autonomously Exploring Robotic Agents in the Real World

Robotic agents that operate autonomously in the real world need to conti...

ORBIT: A Unified Simulation Framework for Interactive Robot Learning Environments

We present ORBIT, a unified and modular framework for robot learning pow...

GoalsEye: Learning High Speed Precision Table Tennis on a Physical Robot

Learning goal conditioned control in the real world is a challenging ope...

Please sign up or login with your details

Forgot password? Click here to reset