SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

by   Wenxuan Zhang, et al.

Generating talking head videos through a face image and a piece of speech audio still contains many challenges. ie, unnatural head movement, distorted expression, and identity modification. We argue that these issues are mainly because of learning from the coupled 2D motion fields. On the other hand, explicitly using 3D information also suffers problems of stiff expression and incoherent video. We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation. To learn the realistic motion coefficients, we explicitly model the connections between audio and different types of motion coefficients individually. Precisely, we present ExpNet to learn the accurate facial expression from audio by distilling both coefficients and 3D-rendered faces. As for the head pose, we design PoseVAE via a conditional VAE to synthesize head motion in different styles. Finally, the generated 3D motion coefficients are mapped to the unsupervised 3D keypoints space of the proposed face render, and synthesize the final video. We conduct extensive experiments to show the superior of our method in terms of motion and video quality.


page 1

page 6

page 7

page 8


StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

We propose StyleTalker, a novel audio-driven talking head generation mod...

Audio-driven Talking Face Video Generation with Natural Head Pose

Real-world talking faces often accompany with natural head movement. How...

Audio-Visual Face Reenactment

This work proposes a novel method to generate realistic talking head vid...

ManVatar : Fast 3D Head Avatar Reconstruction Using Motion-Aware Neural Voxels

With NeRF widely used for facial reenactment, recent methods can recover...

DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions

For realistic talking head generation, creating natural head motion whil...

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Audio driven talking head synthesis is a challenging task that attracts ...

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

Animating still face images with deep generative models using a speech i...

Please sign up or login with your details

Forgot password? Click here to reset