Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

12/02/2020
by   Felix Grezes, et al.
0

Recent works have shown that Deep Recurrent Neural Networks using the LSTM architecture can achieve strong single-channel speech enhancement by estimating time-frequency masks. However, these models do not naturally generalize to multi-channel inputs from varying microphone configurations. In contrast, spatial clustering techniques can achieve such generalization but lack a strong signal model. Our work proposes a combination of the two approaches. By using LSTMs to enhance spatial clustering based time-frequency masks, we achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance and generality of multi-channel spatial clustering. We compare our proposed system to several baselines on the CHiME-3 dataset. We evaluate the quality of the audio from each system using SDR from the BSS_eval toolkit and PESQ. We evaluate the intelligibility of the output of each system using word error rate from a Kaldi automatic speech recognizer.

READ FULL TEXT

page 2

page 3

research
12/02/2020

Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement

Recurrent neural networks using the LSTM architecture can achieve signif...
research
09/18/2023

Refining DNN-based Mask Estimation using CGMM-based EM Algorithm for Multi-channel Noise Reduction

In this paper, we present a method that allows to further improve speech...
research
12/02/2020

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks

Spatial clustering techniques can achieve significant multi-channel nois...
research
04/25/2022

Cleanformer: A microphone array configuration-invariant, streaming, multichannel neural enhancement frontend for ASR

This work introduces the Cleanformer, a streaming multichannel neural ba...
research
06/30/2020

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

Acoustic models in real-time speech recognition systems typically stack ...
research
04/19/2022

Single-Channel Speech Dereverberation using Subband Network with A Reverberation Time Shortening Target

This work proposes a subband network for single-channel speech dereverbe...

Please sign up or login with your details

Forgot password? Click here to reset