Direct Multi-view Multi-person 3D Pose Estimation

11/07/2021
by   PetsTime, et al.
4

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided attention mechanism, called projective attention, to more precisely fuse the cross-view information for each joint. MvP also introduces a RayConv operation to integrate the view-dependent camera geometry into the feature representations for augmenting the projective attention. We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92.3 Panoptic dataset, improving upon the previous best approach [36] by 9.8 is general and also extendable to recovering human mesh represented by the SMPL model, thus useful for modeling multi-person body shapes. Code and models are available at https://github.com/sail-sg/mvp.

READ FULL TEXT

page 3

page 8

page 15

research
09/03/2019

Cross View Fusion for 3D Human Pose Estimation

We present an approach to recover absolute 3D human poses from multi-vie...
research
04/06/2021

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo

Existing approaches for multi-view multi-person 3D pose estimation expli...
research
12/06/2020

MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand Pose Estimation

Estimating 3D hand poses from a single RGB image is challenging because ...
research
11/28/2022

A Light Touch Approach to Teaching Transformers Multi-view Geometry

Transformers are powerful visual learners, in large part due to their co...
research
05/25/2022

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

This paper presents Volumetric Transformer Pose estimator (VTP), the fir...
research
07/12/2023

Deep learning-based estimation of whole-body kinematics from multi-view images

It is necessary to analyze the whole-body kinematics (including joint lo...
research
10/05/2021

Shape-aware Multi-Person Pose Estimation from Multi-View Images

In this paper we contribute a simple yet effective approach for estimati...

Please sign up or login with your details

Forgot password? Click here to reset