Spectrogram-channels u-net: a source separation model viewing each channel as the spectrogram of each source

10/26/2018
by   Jaehoon Oh, et al.
0

Nowadays, the task of sound source separation is an interesting task for Music Information Retrieval(MIR) researchers. Because it is challengeable itself and it is related to many other MIR tasks such as automatic lyric transcription, singer identification, and voice conversion. In this paper, we propose an intuitive spectrogram-based model for source separation by adapting U-Net which was proposed for biomedical image segmentation. We call it Spectrogram-Channels U-Net, which means each channel of the output corresponds to the spectrogram of source itself. The proposed model can be used for not only singing voice separation but also multi-instrument separation by changing only the number of output channels. In addition, we propose a loss function considering balancing between volume of stems. Finally, we get a performance comparable to other state-of-the-art models on both separation tasks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset