DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

by   Fa-Ting Hong, et al.

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this work, firstly, we present a novel self-supervised method for learning dense 3D facial geometry (ie, depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Secondly, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (ie, appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (ie, VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks. The codes and trained models are publicly available on the GitHub project page at https://github.com/harlanhong/CVPR2022-DaGAN


page 1

page 4

page 5

page 9

page 10

page 11

page 12

page 13


Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Talking head video generation aims to produce a synthetic human face vid...

Talking-head Generation with Rhythmic Head Motion

When people deliver a speech, they naturally move heads, and this rhythm...

Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space

Creating realistic 3D facial animation is crucial for various applicatio...

Geometry-guided Dense Perspective Network for Speech-Driven Facial Animation

Realistic speech-driven 3D facial animation is a challenging problem due...

3D Facial Geometry Recovery from a Depth View with Attention Guided Generative Adversarial Network

We present to recover the complete 3D facial geometry from a single dept...

Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?

This work digs into a root question in human perception: can face geomet...

Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos

Modern generators render talking-head videos with impressive levels of p...

Please sign up or login with your details

Forgot password? Click here to reset