Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

10/27/2022
by   William Ravenscroft, et al.
0

Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks. A limitation of these models is that they have a fixed receptive field (RF). Recent research in speech dereverberation has shown that the optimal RF of a TCN varies with the reverberation characteristics of the speech signal. In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation. The proposed models are capable of achieving an 11.1 dB average scale-invariant signal-to-distortion ratio (SISDR) improvement over the input signal on the WHAMR benchmark. A relatively small deformable TCN model of 1.3M parameters is proposed which gives comparable separation performance to larger and more computationally complex models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation

Speech dereverberation is an important stage in many speech technology a...
research
04/13/2022

Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation

Speech dereverberation is often an important requirement in robust speec...
research
07/01/2020

Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment

With the success of deep learning in speech signal processing, speaker-i...
research
04/14/2023

On Data Sampling Strategies for Training Neural Network Speech Separation Models

Speech separation remains an important area of multi-speaker signal proc...
research
02/12/2019

FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks

Deep dilated temporal convolutional networks (TCN) have been proved to b...
research
06/09/2020

An Efficient Accelerator Design Methodology for Deformable Convolutional Networks

Deformable convolutional networks have demonstrated outstanding performa...
research
04/14/2020

Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment

In daily listening environments, speech is always distorted by backgroun...

Please sign up or login with your details

Forgot password? Click here to reset