Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes

by   Bohao Fan, et al.

3D human pose estimation in outdoor environments has garnered increasing attention recently. However, prevalent 3D human pose datasets pertaining to outdoor scenes lack diversity, as they predominantly utilize only one type of modality (RGB image or pointcloud), and often feature only one individual within each scene. This limited scope of dataset infrastructure considerably hinders the variability of available data. In this article, we propose Human-M3, an outdoor multi-modal multi-view multi-person human pose database which includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds. In order to obtain accurate human poses, we propose an algorithm based on multi-modal data input to generate ground truth annotation. This benefits from robust pointcloud detection and tracking, which solves the problem of inaccurate human localization and matching ambiguity that may exist in previous multi-view RGB videos in outdoor multi-person scenes, and generates reliable ground truth annotations. Evaluation of multiple different modalities algorithms has shown that this database is challenging and suitable for future research. Furthermore, we propose a 3D human pose estimation algorithm based on multi-modal data input, which demonstrates the advantages of multi-modal data input for 3D human pose estimation. Code and data will be released on


page 2

page 3

page 5

page 8

page 9

page 11

page 12


A Multi-view RGB-D Approach for Human Pose Estimation in Operating Rooms

Many approaches have been proposed for human pose estimation in single a...

EgoBody: Human Body Shape, Motion and Social Interactions from Head-Mounted Devices

Understanding social interactions from first-person views is crucial for...

The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose

The availability of a large labeled dataset is a key requirement for app...

X-MAS: Extremely Large-Scale Multi-Modal Sensor Dataset for Outdoor Surveillance in Real Environments

In robotics and computer vision communities, extensive studies have been...

Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

We propose a system for rearranging objects in a scene to achieve a desi...

FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild

Estimating the 3D structure of the human body from natural scenes is a f...

Mixture Dense Regression for Object Detection and Human Pose Estimation

Mixture models are well-established machine learning approaches that, in...

Please sign up or login with your details

Forgot password? Click here to reset