Learning to Imitate Object Interactions from Internet Videos

11/23/2022
by   Austin Patel, et al.
0

We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning. We apply our reconstruction technique to 100 challenging Internet videos. We further show that we can successfully imitate a range of different object interactions in a physics simulator. Our object-centric approach is not limited to human-like end-effectors and can learn to imitate object interactions using different embodiments, like a robotic arm with a parallel jaw gripper.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

page 7

page 8

research
08/16/2021

Towards unconstrained joint hand-object reconstruction from RGB videos

Our work aims to obtain 3D reconstruction of hands and manipulated objec...
research
04/11/2019

Learning joint reconstruction of hands and manipulated objects

Estimating hand-object manipulations is essential for interpreting and i...
research
10/12/2020

The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain

Wearable cameras allow to collect images and videos of humans interactin...
research
12/16/2021

Human Hands as Probes for Interactive Object Understanding

Interactive object understanding, or what we can do to objects and how i...
research
03/30/2022

Understanding 3D Object Articulation in Internet Videos

We propose to investigate detecting and characterizing the 3D planar art...
research
12/17/2020

Reconstructing Hand-Object Interactions in the Wild

In this work we explore reconstructing hand-object interactions in the w...
research
04/10/2019

Next-Active-Object prediction from Egocentric Videos

Although First Person Vision systems can sense the environment from the ...

Please sign up or login with your details

Forgot password? Click here to reset