Monaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure

02/02/2018
by   Morten Kolbæk, et al.
0

In this paper we propose a Deep Neural Network (DNN) based Speech Enhancement (SE) system that is designed to maximize an approximation of the Short-Time Objective Intelligibility (STOI) measure. We formalize an approximate-STOI cost function and derive analytical expressions for the gradients required for DNN training and show that these gradients have desirable properties when used together with gradient based optimization techniques. We show through simulation experiments that the proposed SE system achieves large improvements in estimated speech intelligibility, when tested on matched and unmatched natural noise types, at multiple signal-to-noise ratios. Furthermore, we show that the SE system, when trained using an approximate-STOI cost function performs on par with a system trained with a mean square error cost applied to short-time temporal envelopes. Finally, we show that the proposed SE system performs on par with a traditional DNN based Short-Time Spectral Amplitude (STSA) SE system in terms of estimated speech intelligibility. These results are important because they suggest that traditional DNN based STSA SE systems might be optimal in terms of estimated speech intelligibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2017

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

Improving speech system performance in noisy environments remains a chal...
research
11/19/2019

Distributed Microphone Speech Enhancement based on Deep Learning

Speech-related applications deliver inferior performance in complex nois...
research
10/27/2022

A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Although deep neural network (DNN)-based speech enhancement (SE) methods...
research
10/12/2021

Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models

A deep neural network (DNN)-based speech enhancement (SE) aiming to maxi...
research
02/24/2022

Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

This paper describes our submission to the L3DAS22 Challenge Task 1, whi...
research
02/14/2020

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Improving subjective sound quality of enhanced signals is one of the mos...
research
06/01/2018

DNN Based Speech Enhancement for Unseen Noises Using Monte Carlo Dropout

In this work, we propose the use of dropouts as a Bayesian estimator for...

Please sign up or login with your details

Forgot password? Click here to reset