Towards Intelligibility-Oriented Audio-Visual Speech Enhancement

11/18/2021
by   Tassadaq Hussain, et al.
0

Existing deep learning (DL) based speech enhancement approaches are generally optimised to minimise the distance between clean and enhanced speech features. These often result in improved speech quality however they suffer from a lack of generalisation and may not deliver the required speech intelligibility in real noisy situations. In an attempt to address these challenges, researchers have explored intelligibility-oriented (I-O) loss functions and integration of audio-visual (AV) information for more robust speech enhancement (SE). In this paper, we introduce DL based I-O SE algorithms exploiting AV information, which is a novel and previously unexplored research direction. Specifically, we present a fully convolutional AV SE model that uses a modified short-time objective intelligibility (STOI) metric as a training cost function. To the best of our knowledge, this is the first work that exploits the integration of AV modalities with an I-O based loss function for SE. Comparative experimental results demonstrate that our proposed I-O AV SE framework outperforms audio-only (AO) and AV models trained with conventional distance-based loss functions, in terms of standard objective evaluation measures when dealing with unseen speakers and noises.

READ FULL TEXT
research
02/08/2022

A Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning for Hearing-Assistive Technologies

Current deep learning (DL) based approaches to speech intelligibility en...
research
11/15/2018

On Training Targets and Objective Functions for Deep-Learning-Based Audio-Visual Speech Enhancement

Audio-visual speech enhancement (AV-SE) is the task of improving speech ...
research
02/11/2022

A Novel Speech Intelligibility Enhancement Model based on CanonicalCorrelation and Deep Learning

Current deep learning (DL) based approaches to speech intelligibility en...
research
09/19/2023

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech ...
research
12/16/2021

Towards Robust Real-time Audio-Visual Speech Enhancement

The human brain contextually exploits heterogeneous sensory information ...
research
09/26/2019

Seeing Voices in Noise: A Study of Audiovisual-Enhanced Vocoded Speech Intelligibility in Cochlear Implant Simulation

Speech perception is a key to verbal communication. For people with hear...
research
09/23/2019

CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement

Noisy situations cause huge problems for suffers of hearing loss as hear...

Please sign up or login with your details

Forgot password? Click here to reset