Synthesising 3D Facial Motion from "In-the-Wild" Speech

04/15/2019
by   Panagiotis Tzirakis, et al.
12

Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions ("in-the-wild") and independent of the speaker. For our purposes, we captured 4D sequences of people uttering 500 words, contained in the Lip Reading Words (LRW) a publicly available large-scale in-the-wild dataset, and built a set of 3D blendshapes appropriate for speech. We correlate the 3D shape parameters of the speech blendshapes to the LRW audio samples by means of a novel time-warping technique, named Deep Canonical Attentional Warping (DCAW), that can simultaneously learn hierarchical non-linear representations and a warping path in an end-to-end manner. We thoroughly evaluate our proposed methods, and show the ability of a deep learning model to synthesise 3D facial motion in handling different speakers and continuous speech signals in uncontrolled conditions.

READ FULL TEXT

page 5

page 7

research
01/19/2017

3D Face Morphable Models "In-the-Wild"

3D Morphable Models (3DMMs) are powerful statistical models of 3D facial...
research
05/08/2019

Capture, Learning, and Synthesis of 3D Speaking Styles

Audio-driven 3D facial animation has been widely explored, but achieving...
research
09/20/2023

FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion

Speech-driven 3D facial animation synthesis has been a challenging task ...
research
12/23/2020

CN-Celeb: multi-genre speaker recognition

Research on speaker recognition is extending to address the vulnerabilit...
research
04/18/2022

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

We present a framework for modeling interactional communication in dyadi...
research
05/25/2023

ReactFace: Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

In dyadic interaction, predicting the listener's facial reactions is cha...
research
10/16/2018

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Large-scale datasets have successively proven their fundamental importan...

Please sign up or login with your details

Forgot password? Click here to reset