Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation

by   Kun Zhang, et al.

The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task, whose target is to learn the location of each patch from an image composed of shuffled patches. During our pretraining process, we only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset. A heatmap-style label for patch location is designed and our learning process is in a non-contrastive way. The weights learned by HSJP pretext task are utilised as backbones of 2D human pose estimator, which are then finetuned on MS-COCO human keypoints dataset. With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets. Our experiments show that downstream pose estimators with our self-supervised pretraining obtain much better performance than those trained from scratch, and are comparable to those using ImageNet classification models as their initial backbones.


page 1

page 3

page 5

page 6

page 7

page 8

page 9

page 11


Patch-based 3D Human Pose Refinement

State-of-the-art 3D human pose estimation approaches typically estimate ...

PASS: An ImageNet replacement for self-supervised pretraining without humans

Computer vision has long relied on ImageNet and other large datasets of ...

Pretraining boosts out-of-domain robustness for pose estimation

Deep neural networks are highly effective tools for human and animal pos...

Selfie: Self-supervised Pretraining for Image Embedding

We introduce a pretraining technique called Selfie, which stands for SEL...

Heatmap Distribution Matching for Human Pose Estimation

For tackling the task of 2D human pose estimation, the great majority of...

Self-Supervision and Spatial-Sequential Attention Based Loss for Multi-Person Pose Estimation

Bottom-up based multi-person pose estimation approaches use heatmaps wit...

Pose-MUM : Reinforcing Key Points Relationship for Semi-Supervised Human Pose Estimation

A well-designed strong-weak augmentation strategy and the stable teacher...

Please sign up or login with your details

Forgot password? Click here to reset