MotionBERT: Unified Pretraining for Human Motion Analysis

10/12/2022
by   Wentao Zhu, et al.
12

We present MotionBERT, a unified pretraining framework, to tackle different sub-tasks of human motion analysis including 3D pose estimation, skeleton-based action recognition, and mesh recovery. The proposed framework is capable of utilizing all kinds of human motion data resources, including motion capture data and in-the-wild videos. During pretraining, the pretext task requires the motion encoder to recover the underlying 3D motion from noisy partial 2D observations. The pretrained motion representation thus acquires geometric, kinematic, and physical knowledge about human motion and therefore can be easily transferred to multiple downstream tasks. We implement the motion encoder with a novel Dual-stream Spatio-temporal Transformer (DSTformer) neural network. It could capture long-range spatio-temporal relationships among the skeletal joints comprehensively and adaptively, exemplified by the lowest 3D pose estimation error so far when trained from scratch. More importantly, the proposed framework achieves state-of-the-art performance on all three downstream tasks by simply finetuning the pretrained motion encoder with 1-2 linear layers, which demonstrates the versatility of the learned motion representations.

READ FULL TEXT
research
04/23/2021

Skeletor: Skeletal Transformers for Robust Body-Pose Estimation

Predicting 3D human pose from a single monoscopic video can be highly ch...
research
11/29/2022

Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos

Previous video-based human pose estimation methods have shown promising ...
research
02/17/2023

Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences

Self-supervised learning has demonstrated remarkable capability in repre...
research
09/16/2023

RMP: A Random Mask Pretrain Framework for Motion Prediction

As the pretraining technique is growing in popularity, little work has b...
research
02/14/2021

Learning Self-Similarity in Space and Time as Generalized Motion for Action Recognition

Spatio-temporal convolution often fails to learn motion dynamics in vide...
research
04/10/2017

Learning Human Motion Models for Long-term Predictions

We propose a new architecture for the learning of predictive spatio-temp...
research
03/10/2023

GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose

3D human mesh recovery from a 2D pose plays an important role in various...

Please sign up or login with your details

Forgot password? Click here to reset