Show Me Your Face, And I'll Tell You How You Speak

06/28/2022
by   Christen Millerdurai, et al.
0

When we speak, the prosody and content of the speech can be inferred from the movement of our lips. In this work, we explore the task of lip to speech synthesis, i.e., learning to generate speech given only the lip movements of a speaker where we focus on learning accurate lip to speech mappings for multiple speakers in unconstrained, large vocabulary settings. We capture the speaker's voice identity through their facial characteristics, i.e., age, gender, ethnicity and condition them along with the lip movements to generate speaker identity aware speech. To this end, we present a novel method "Lip2Speech", with key design choices to achieve accurate lip to speech synthesis in unconstrained scenarios. We also perform various experiments and extensive evaluation using quantitative, qualitative metrics and human evaluation.

READ FULL TEXT

page 11

page 12

research
05/17/2020

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

Humans involuntarily tend to infer parts of the conversation from lip mo...
research
09/01/2022

Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild

In this work, we address the problem of generating speech from silent li...
research
09/09/2022

Reconstructing the Dynamic Directivity of Unconstrained Speech

An accurate model of natural speech directivity is an important step tow...
research
03/31/2022

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Since facial actions such as lip movements contain significant informati...
research
11/01/2022

Generating Gender-Ambiguous Text-to-Speech Voices

The gender of a voice assistant or any voice user interface is a central...
research
09/20/2023

TRAVID: An End-to-End Video Translation Framework

In today's globalized world, effective communication with people from di...
research
03/28/2018

Lip Movements Generation at a Glance

Cross-modality generation is an emerging topic that aims to synthesize d...

Please sign up or login with your details

Forgot password? Click here to reset