Scene-aware Egocentric 3D Human Pose Estimation

by   Jian Wang, et al.

Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality. Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene. To address this issue, we propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints. To this end, we propose an egocentric depth estimation network to predict the scene depth map from a wide-view egocentric fisheye camera while mitigating the occlusion of the human body with a depth-inpainting network. Next, we propose a scene-aware pose estimation network that projects the 2D image features and estimated depth map of the scene into a voxel space and regresses the 3D pose with a V2V network. The voxel-based feature representation provides the direct geometric connection between 2D image features and scene geometry, and further facilitates the V2V network to constrain the predicted pose based on the estimated scene geometry. To enable the training of the aforementioned networks, we also generated a synthetic dataset, called EgoGTA, and an in-the-wild dataset based on EgoPW, called EgoPW-Scene. The experimental results of our new evaluation sequences show that the predicted 3D egocentric poses are accurate and physically plausible in terms of human-scene interaction, demonstrating that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.


page 1

page 3

page 7

page 8


Estimating Egocentric 3D Human Pose in the Wild with External Weak Supervision

Egocentric 3D human pose estimation with a single fisheye camera has dra...

Geometric Pose Affordance: 3D Human Pose with Scene Constraints

Full 3D estimation of human pose from a single image remains a challengi...

Scene-aware Human Pose Generation using Transformer

Affordance learning considers the interaction opportunities for an actor...

Estimating Egocentric 3D Human Pose in Global Space

Egocentric 3D human pose estimation using a single fisheye camera has be...

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

We present SLOPER4D, a novel scene-aware dataset collected in large urba...

PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

Reconstructing the 3D pose of a person in metric scale from a single vie...

VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild

We present VoxelTrack for multi-person 3D pose estimation and tracking f...

Please sign up or login with your details

Forgot password? Click here to reset