Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging

06/10/2020
by   Yeqi Bai, et al.
0

While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks. It is the key enabler to influential use cases of image generation, especially for business in public security and entertainment. Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. By adopting new strategies on data model and training, we demonstrate dramatic performance boost over state-of-the-art solution, by doubling the recall of individual identity, and lifting the quality score from 15 to 19 based on the mutual information score with VGGFace classifier.

READ FULL TEXT

page 2

page 4

page 8

page 12

page 13

research
11/11/2018

Deep Face Quality Assessment

Face image quality is an important factor in facial recognition systems ...
research
12/04/2018

FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis

The advance of Generative Adversarial Networks (GANs) enables realistic ...
research
11/03/2021

Adversarially Perturbed Wavelet-based Morphed Face Generation

Morphing is the process of combining two or more subjects in an image in...
research
08/30/2021

StackGAN: Facial Image Generation Optimizations

Current state-of-the-art photorealistic generators are computationally e...
research
12/05/2019

MetalGAN: Multi-Domain Label-Less Image Synthesis Using cGANs and Meta-Learning

Image synthesis is currently one of the most addressed image processing ...
research
08/04/2022

Artificial Image Tampering Distorts Spatial Distribution of Texture Landmarks and Quality Characteristics

Advances in AI based computer vision has led to a significant growth in ...
research
12/12/2019

Speech-driven facial animation using polynomial fusion of features

Speech-driven facial animation involves using a speech signal to generat...

Please sign up or login with your details

Forgot password? Click here to reset