Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement

04/25/2020
by   Zakaria Aldeneh, et al.
0

We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual features provide not only high-level information about speech activity, i.e. speech vs. no speech, but also fine-grained visual information about the place of articulation. An interesting byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications. We demonstrate the effectiveness of the learned visual representations for classifying visemes (the visual analogy to phonemes). Our results provide insight into important aspects of audiovisual speech enhancement and demonstrate how such models can be used for self-supervision tasks for visual speech applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

Exploring WavLM on Speech Enhancement

There is a surge in interest in self-supervised learning approaches for ...
research
09/14/2023

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

Speech enhancement systems are typically trained using pairs of clean an...
research
09/10/2023

Gray Jedi MVDR Post-filtering

Spatial filters can exploit deep-learning-based speech enhancement model...
research
07/25/2023

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

Self-supervised speech representations (SSSRs) have been successfully ap...
research
03/03/2020

Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data

While deep learning systems have gained significant ground in speech enh...
research
06/10/2022

Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation

Self-supervised learning (SSL) achieves great success in monaural speech...
research
06/15/2016

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

Deep Neural Networks (DNN) have been successful in en- hancing noisy spe...

Please sign up or login with your details

Forgot password? Click here to reset