Synthetic Training for Monocular Human Mesh Recovery

by   Yu Sun, et al.

Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve this goal. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available. Besides, to strengthen the generalization ability, most existing methods have used in-the-wild 2D pose datasets to supervise the estimated 3D pose via 3D-to-2D projection. However, we observe that the commonly used weak-perspective model performs poorly in dealing with the external foreshortening effect of camera projection. Therefore, we propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants for more proper supervision. The proposed method outperforms previous methods on the CMU Panoptic Studio dataset according to the evaluation results and achieves comparable results on the Human3.6M body and STB hand benchmarks. More impressively, the performance in close shot images gets significantly improved using the proposed D2S projection for weak supervision, while maintains obvious superiority in computational efficiency.


page 1

page 2

page 3

page 9


Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

As it is hard to calibrate single-view RGB images in the wild, existing ...

HandTailor: Towards High-Precision Monocular 3D Hand Recovery

3D hand pose estimation and shape recovery are challenging tasks in comp...

MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose

Reconstructing multi-human body mesh from a single monocular image is an...

Recovering 3D Human Mesh from Monocular Images: A Survey

Estimating human pose and shape from monocular images is a long-standing...

Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation

We describe an end-to-end method for recovering 3D human body mesh from ...

BCNet: Learning Body and Cloth Shape from A Single Image

In this paper, we consider the problem to automatically reconstruct garm...

Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild

Recovering 3D human mesh in the wild is greatly challenging as in-the-wi...

Please sign up or login with your details

Forgot password? Click here to reset