Separate What You Describe: Language-Queried Audio Source Separation

03/28/2022
by   Xubo Liu, et al.
4

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e.g., "a man tells a joke followed by people laughing"). A unique challenge in LASS is associated with the complexity of natural language description and its relation with the audio sources. To address this issue, we proposed LASS-Net, an end-to-end neural network that is learned to jointly process acoustic and linguistic information, and separate the target source that is consistent with the language query from an audio mixture. We evaluate the performance of our proposed system with a dataset created from the AudioCaps dataset. Experimental results show that LASS-Net achieves considerable improvements over baseline methods. Furthermore, we observe that LASS-Net achieves promising generalization results when using diverse human-annotated descriptions as queries, indicating its potential use in real-world scenarios. The separated audio samples and source code are available at https://liuxubo717.github.io/LASS-demopage.

READ FULL TEXT
research
08/19/2019

Audio query-based music source separation

In recent years, music source separation has been one of the most intens...
research
05/11/2023

Universal Source Separation with Weakly Labelled Data

Universal source separation (USS) is a fundamental research task for com...
research
03/28/2023

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

We propose a self-supervised approach for learning to perform audio sour...
research
09/20/2021

Acoustic Echo Cancellation using Residual U-Nets

This paper presents an acoustic echo canceler based on a U-Net convoluti...
research
04/28/2021

AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries

This paper proposes a neural network that performs audio transformations...
research
07/09/2022

Learning to Separate Voices by Spatial Regions

We consider the problem of audio voice separation for binaural applicati...
research
03/23/2018

Generalization Challenges for Neural Architectures in Audio Source Separation

Recent work has shown that recurrent neural networks can be trained to s...

Please sign up or login with your details

Forgot password? Click here to reset