End-to-End Human Pose and Mesh Reconstruction with Transformers

by   Kevin Lin, et al.

We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets. Moreover, we demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset.


page 1

page 6

page 7

page 8

page 13

page 14

page 15


Mesh Graphormer

We present a graph-convolution-reinforced transformer, named Mesh Grapho...

MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

We present Mesh Pre-Training (MPT), a new pre-training framework that le...

MeshLeTemp: Leveraging the Learnable Vertex-Vertex Relationship to Generalize Human Pose and Mesh Reconstruction for In-the-Wild Scenes

We present MeshLeTemp, a powerful method for 3D human pose and mesh reco...

GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose

3D human mesh recovery from a 2D pose plays an important role in various...

Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation

3D human pose and shape recovery from a monocular RGB image is a challen...

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

In this study, we focus on the problem of 3D human mesh recovery from a ...

A Modular Multi-stage Lightweight Graph Transformer Network for Human Pose and Shape Estimation from 2D Human Pose

In this research, we address the challenge faced by existing deep learni...

Please sign up or login with your details

Forgot password? Click here to reset