Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

by   Xue Yang, et al.

Speaker-independent speech separation has achieved remarkable performance in recent years with the development of deep neural network (DNN). Various network architectures, from traditional convolutional neural network (CNN) and recurrent neural network (RNN) to advanced transformer, have been designed sophistically to improve separation performance. However, the state-of-the-art models usually suffer from several flaws related to the computation, such as large model size, huge memory consumption and computational complexity. To find the balance between the performance and computational efficiency and to further explore the modeling ability of traditional network structure, we combine RNN and a newly proposed variant of convolutional network to cope with speech separation problem. By embedding two RNNs into basic block of this variant with the help of dual-path strategy, the proposed network can effectively learn the local information and global dependency. Besides, a four-staged structure enables the separation procedure to be performed gradually at finer and finer scales as the feature dimension increases. The experimental results on various datasets have proven the effectiveness of the proposed method and shown that a trade-off between the separation performance and computational efficiency is well achieved.


page 1

page 2

page 3

page 4


Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Recent advances in the design of neural network architectures, in partic...

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

While the performance of offline neural speech separation systems has be...

Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Transformer has shown advanced performance in speech separation, benefit...

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

Recent studies in deep learning-based speech separation have proven the ...

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

The dominant speech separation models are based on complex recurrent or ...

On Data Sampling Strategies for Training Neural Network Speech Separation Models

Speech separation remains an important area of multi-speaker signal proc...

Dual-Path Modeling for Long Recording Speech Separation in Meetings

The continuous speech separation (CSS) is a task to separate the speech ...

Please sign up or login with your details

Forgot password? Click here to reset