Human Motion Transfer from Poses in the Wild

by   Jian Ren, et al.

In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video. It is a video-to-video translation task in which the estimated poses are used to bridge two domains. Despite substantial progress on the topic, there exist several problems with the previous methods. First, there is a domain gap between training and testing pose sequences–the model is tested on poses it has not seen during training, such as difficult dancing moves. Furthermore, pose detection errors are inevitable, making the job of the generator harder. Finally, generating realistic pixels from sparse poses is challenging in a single step. To address these challenges, we introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training. We propose a pose augmentation method to minimize the training-test gap, a unified paired and unpaired learning strategy to improve the robustness to detection errors, and two-stage network architecture to achieve superior texture quality. To further boost research on the topic, we build two human motion datasets. Finally, we show the superiority of our approach over the state-of-the-art studies through extensive experiments and evaluations on different datasets.


Single-Shot Freestyle Dance Reenactment

The task of motion transfer between a source dancer and a target person ...

Deep Video-Based Performance Cloning

We present a new video-based performance cloning technique. After traini...

Dressing in the Wild by Watching Dance Videos

While significant progress has been made in garment transfer, one of the...

Mining Automatically Estimated Poses from Video Recordings of Top Athletes

Human pose detection systems based on state-of-the-art DNNs are on the g...

Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild

Due to the ubiquity of smartphones, it is popular to take photos of one'...

Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training

Estimating human pose from video is a task that receives considerable at...

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Generative AI has made significant strides in computer vision, particula...

Please sign up or login with your details

Forgot password? Click here to reset