StableFace: Analyzing and Improving Motion Stability for Talking Face Generation

08/29/2022
by   Jun Ling, et al.
13

While previous speech-driven talking face generation methods have made significant progress in improving the visual quality and lip-sync quality of the synthesized videos, they pay less attention to lip motion jitters which greatly undermine the realness of talking face videos. What causes motion jitters, and how to mitigate the problem? In this paper, we conduct systematic analyses on the motion jittering problem based on a state-of-the-art pipeline that uses 3D face representations to bridge the input audio and output video, and improve the motion stability with a series of effective designs. We find that several issues can lead to jitters in synthesized talking face video: 1) jitters from the input 3D face representations; 2) training-inference mismatch; 3) lack of dependency modeling among video frames. Accordingly, we propose three effective solutions to address this issue: 1) we propose a gaussian-based adaptive smoothing module to smooth the 3D face representations to eliminate jitters in the input; 2) we add augmented erosions on the input data of the neural renderer in training to simulate the distortion in inference to reduce mismatch; 3) we develop an audio-fused transformer generator to model dependency among video frames. Besides, considering there is no off-the-shelf metric for measuring motion jitters in talking face video, we devise an objective metric (Motion Stability Index, MSI), to quantitatively measure the motion jitters by calculating the reciprocal of variance acceleration. Extensive experimental results show the superiority of our method on motion-stable face video generation, with better quality than previous systems.

READ FULL TEXT

page 1

page 3

page 5

page 8

page 9

page 10

research
02/24/2020

Audio-driven Talking Face Video Generation with Natural Head Pose

Real-world talking faces often accompany with natural head movement. How...
research
12/06/2021

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning

Audio-driven one-shot talking face generation methods are usually traine...
research
05/08/2017

You said that?

We present a method for generating a video of a talking face. The method...
research
04/26/2022

Evaluating the Quality of a Synthesized Motion with the Fréchet Motion Distance

Evaluating the Quality of a Synthesized Motion with the Fréchet Motion D...
research
05/01/2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Generating talking person portraits with arbitrary speech audio is a cru...
research
10/06/2022

Audio-Visual Face Reenactment

This work proposes a novel method to generate realistic talking head vid...
research
05/22/2023

'Tax-free' 3DMM Conditional Face Generation

3DMM conditioned face generation has gained traction due to its well-def...

Please sign up or login with your details

Forgot password? Click here to reset