Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

Singing voice separation based on deep learning relies on the usage of time-frequency masking. In many cases the masking process is not a learnable function or is not encapsulated into the deep learning optimization. Consequently, most of the existing methods rely on a post processing step using the generalized Wiener filtering. This work proposes a method that learns and optimizes (during training) a source-dependent mask and does not need the aforementioned post processing step. We introduce a recurrent inference algorithm, a sparse transformation step to improve the mask generation process, and a learned denoising filter. Obtained results show an increase of 0.49 dB for the signal to distortion ratio and 0.30 dB for the signal to interference ratio, compared to previous state-of-the-art approaches for monaural singing voice separation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2017

A Recurrent Encoder-Decoder Approach with Skip-filtering Connections for Monaural Singing Voice Separation

The objective of deep learning methods based on encoder-decoder architec...
research
02/01/2018

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

Monaural singing voice separation task focuses on the prediction of the ...
research
03/03/2020

Unsupervised Interpretable Representation Learning for Singing Voice Separation

In this work, we present a method for learning interpretable music signa...
research
12/02/2019

Investigating Deep Neural Transformations for Spectrogram-based Musical Source Separation

Musical Source Separation (MSS) is a signal processing task that tries t...
research
02/09/2022

Time-Frequency Mask Aware Bi-directional LSTM: A Deep Learning Approach for Underwater Acoustic Signal Separation

The underwater acoustic signals separation is a key technique for the un...
research
04/26/2018

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

This paper proposes an end-to-end approach for single-channel speaker-in...
research
12/04/2018

Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy

Separating a singing voice from its music accompaniment remains an impor...

Please sign up or login with your details

Forgot password? Click here to reset