FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation

02/02/2019
by   Ziqiang Shi, et al.
0

Deep gated convolutional networks have been proved to be very effective in single channel speech separation. However current state-of-the-art framework often considers training the gated convolutional networks in time-frequency (TF) domain. Such an approach will result in limited perceptual score, such as signal-to-distortion ratio (SDR) upper bound of separated utterances and also fail to exploit an end-to-end framework. In this paper we present an integrated simple and effective end-to-end approach to monaural speech separation, which consists of deep gated convolutional neural networks (GCNN) that takes the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. In addition long short-term memory (LSTM) is employed for long term temporal modeling. For the objective, we propose to train the network by directly optimizing utterance level SDR in a permutation invariant training (PIT) style. Our experiments on the the public WSJ0-2mix data corpus demonstrate that this new scheme can produce more discriminative separated utterances and leading to performance improvement on the speaker separation task.

READ FULL TEXT
research
02/12/2019

FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks

Deep dilated temporal convolutional networks (TCN) have been proved to b...
research
09/07/2020

Toward the pre-cocktail party problem with TasTas+

Deep neural network with dual-path bi-directional long short-term memory...
research
01/23/2020

La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention

Deep neural network with dual-path bi-directional long short-term memory...
research
11/16/2021

Single-channel speech separation using Soft-minimum Permutation Invariant Training

The goal of speech separation is to extract multiple speech sources from...
research
08/06/2020

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory...
research
08/07/2023

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

Deep Attractor Network (DANet) is the state-of-the-art technique in spee...
research
12/03/2020

GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis

This paper introduces a graphical representation approach of prosody bou...

Please sign up or login with your details

Forgot password? Click here to reset