Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

07/27/2022
by   Junhyeong Cho, et al.
2

Transformer encoder architectures have recently achieved state-of-the-art results on monocular 3D human mesh reconstruction, but they require a substantial number of parameters and expensive computations. Due to the large memory overhead and slow inference speed, it is difficult to deploy such models for practical use. In this paper, we propose a novel transformer encoder-decoder architecture for 3D human mesh reconstruction from a single image, called FastMETRO. We identify the performance bottleneck in the encoder-based transformers is caused by the token design which introduces high complexity interactions among input tokens. We disentangle the interactions via an encoder-decoder architecture, which allows our model to demand much fewer parameters and shorter inference time. In addition, we impose the prior knowledge of human body's morphological relationship via attention masking and mesh upsampling operations, which leads to faster convergence with higher accuracy. Our FastMETRO improves the Pareto-front of accuracy and efficiency, and clearly outperforms image-based methods on Human3.6M and 3DPW. Furthermore, we validate its generalizability on FreiHAND.

READ FULL TEXT

page 3

page 6

page 11

page 12

page 14

page 21

page 22

research
11/19/2022

TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer

In this paper, we introduce a set of effective TOken REduction (TORE) st...
research
04/01/2021

Mesh Graphormer

We present a graph-convolution-reinforced transformer, named Mesh Grapho...
research
11/24/2022

MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

We present Mesh Pre-Training (MPT), a new pre-training framework that le...
research
07/31/2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

In this study, we focus on the problem of 3D human mesh recovery from a ...
research
03/10/2023

GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose

3D human mesh recovery from a 2D pose plays an important role in various...
research
07/21/2021

Multi-Stream Transformers

Transformer-based encoder-decoder models produce a fused token-wise repr...
research
11/22/2021

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Deep network architectures struggle to continually learn new tasks witho...

Please sign up or login with your details

Forgot password? Click here to reset