Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

07/30/2020
by   Jason Y. Zhang, et al.
12

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. Notably, our method runs on datasets without any scene- or object-level 3D supervision. Our key insight is that considering humans and objects jointly gives rise to "3D common sense" constraints that can be used to resolve ambiguity. In particular, we introduce a scale loss that learns the distribution of object size from data; an occlusion-aware silhouette re-projection loss to optimize object pose; and a human-object interaction loss to capture the spatial layout of objects with which humans interact. We empirically validate that our constraints dramatically reduce the space of likely 3D spatial configurations. We demonstrate our approach on challenging, in-the-wild images of humans interacting with large objects (such as bicycles, motorcycles, and surfboards) and handheld objects (such as laptops, tennis rackets, and skateboards). We quantify the ability of our approach to recover human-object arrangements and outline remaining challenges in this relatively domain. The project webpage can be found at https://jasonyzhang.com/phosa.

READ FULL TEXT

page 21

page 22

page 23

page 24

page 25

page 26

page 27

page 28

research
09/06/2022

Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors

We present a method for inferring diverse 3D models of human-object inte...
research
05/16/2023

Understanding 3D Object Interaction from a Single Image

Humans can easily understand a single image as depicting multiple potent...
research
03/07/2022

Human-Aware Object Placement for Visual Environment Reconstruction

Humans are in constant contact with the world as they move through it an...
research
05/07/2021

Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior

Human-Object Interaction (HOI) detection aims to detect visual relations...
research
08/23/2023

CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images

We present a method for teaching machines to understand and model the un...
research
02/13/2023

Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection

Indoor 3D object detection is an essential task in single image scene un...
research
11/30/2022

ObjCAViT: Improving Monocular Depth Estimation Using Natural Language Models And Image-Object Cross-Attention

While monocular depth estimation (MDE) is an important problem in comput...

Please sign up or login with your details

Forgot password? Click here to reset