Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis

11/26/2022
by   Duomin Wang, et al.
0

We present a novel one-shot talking head synthesis method that achieves disentangled and fine-grained control over lip motion, eye gaze blink, head pose, and emotional expression. We represent different motions via disentangled latent representations and leverage an image generator to synthesize talking heads from them. To effectively disentangle each motion factor, we propose a progressive disentangled representation learning strategy by separating the factors in a coarse-to-fine manner, where we first extract unified motion feature from the driving signal, and then isolate each fine-grained motion from the unified feature. We introduce motion-specific contrastive learning and regressing for non-emotional motions, and feature-level decorrelation and self-reconstruction for emotional expression, to fully utilize the inherent properties of each motion factor in unstructured video data to achieve disentanglement. Experiments show that our method provides high quality speech lip-motion synchronization along with precise and disentangled control over multiple extra facial motions, which can hardly be achieved by previous methods.

READ FULL TEXT

page 1

page 4

page 7

page 8

page 13

page 16

page 17

page 18

research
08/08/2018

Towards Learning Fine-Grained Disentangled Representations from Speech

Learning disentangled representations of high-dimensional data is curren...
research
10/02/2019

Animating Face using Disentangled Audio Representations

All previous methods for audio-driven talking head generation assume the...
research
08/01/2023

Context-Aware Talking-Head Video Editing

Talking-head video editing aims to efficiently insert, delete, and subst...
research
12/07/2022

Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors

In this paper, we introduce a simple and novel framework for one-shot au...
research
03/06/2020

Semi-Supervised StyleGAN for Disentanglement Learning

Disentanglement learning is crucial for obtaining disentangled represent...
research
05/27/2022

Video2StyleGAN: Disentangling Local and Global Variations in a Video

Image editing using a pretrained StyleGAN generator has emerged as a pow...
research
03/29/2023

The secret of immersion: actor driven camera movement generation for auto-cinematography

Immersion plays a vital role when designing cinematic creations, yet the...

Please sign up or login with your details

Forgot password? Click here to reset