Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

by   Najmeh Sadoughi, et al.

Articulation, emotion, and personality play strong roles in the orofacial movements. To improve the naturalness and expressiveness of virtual agents (VAs), it is important that we carefully model the complex interplay between these factors. This paper proposes a conditional generative adversarial network, called conditional sequential GAN (CSG), which learns the relationship between emotion and lexical content in a principled manner. This model uses a set of articulatory and emotional features directly extracted from the speech signal as conditioning inputs, generating realistic movements. A key feature of the approach is that it is a speech-driven framework that does not require transcripts. Our experiments show the superiority of this model over three state-of-the-art baselines in terms of objective and subjective evaluations. When the target emotion is known, we propose to create emotionally dependent models by either adapting the base model with the target emotional data (CSG-Emo-Adapted), or adding emotional conditions as the input of the model (CSG-Emo-Aware). Objective evaluations of these models show improvements for the CSG-Emo-Adapted compared with the CSG model, as the trajectory sequences are closer to the original sequences. Subjective evaluations show significantly better results for this model compared with the CSG model when the target emotion is happiness.


page 1

page 2

page 3

page 4


VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Emotional voice conversion (EVC) aims to convert the emotion of speech f...

Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech

Prosodic phrasing is crucial to the naturalness and intelligibility of e...

emotional speech synthesis with rich and granularized control

This paper proposes an effective emotion control method for an end-to-en...

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

Emotional Voice Conversion (EVC) aims to convert the emotional style of ...

Emotion Dependent Facial Animation from Affective Speech

In human-to-computer interaction, facial animation in synchrony with aff...

Towards Emotion-Aware Agents For Negotiation Dialogues

Negotiation is a complex social interaction that encapsulates emotional ...

Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker

Voice synthesis has seen significant improvements in the past decade res...

Please sign up or login with your details

Forgot password? Click here to reset