Dilated Convolution with Dilated GRU for Music Source Separation

by   Jen-Yu Liu, et al.

Stacked dilated convolutions used in Wavenet have been shown effective for generating high-quality audios. By replacing pooling/striding with dilation in convolution layers, they can preserve high-resolution information and still reach distant locations. Producing high-resolution predictions is also crucial in music source separation, whose goal is to separate different sound sources while maintaining the quality of the separated sounds. Therefore, this paper investigates using stacked dilated convolutions as the backbone for music source separation. However, while stacked dilated convolutions can reach wider context than standard convolutions, their effective receptive fields are still fixed and may not be wide enough for complex music audio signals. To reach information at remote locations, we propose to combine dilated convolution with a modified version of gated recurrent units (GRU) called the `Dilated GRU' to form a block. A Dilated GRU unit receives information from k steps before instead of the previous step for a fixed k. This modification allows a GRU unit to reach a location with fewer recurrent steps and run faster because it can execute partially in parallel. We show that the proposed model with a stack of such blocks performs equally well or better than the state-of-the-art models for separating vocals and accompaniments.


Music Source Separation Using Stacked Hourglass Networks

In this paper, we propose a simple yet effective method for multiple mus...

Sudo rm -rf: Efficient Networks for Universal Audio Source Separation

In this paper, we present an efficient neural network for end-to-end gen...

Music source separation conditioned on 3D point clouds

Recently, significant progress has been made in audio source separation ...

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

Recent approaches for music source separation are almost exclusively bas...

D3Net: Densely connected multidilated DenseNet for music source separation

Music source separation involves a large input field to model a long-ter...

Denoising Auto-encoder with Recurrent Skip Connections and Residual Regression for Music Source Separation

Convolutional neural networks with skip connections have shown good perf...

Meta-learning Extractors for Music Source Separation

We propose a hierarchical meta-learning-inspired model for music source ...

Please sign up or login with your details

Forgot password? Click here to reset