Speech enhancement with weakly labelled data from AudioSet

02/19/2021
by   Qiuqiang Kong, et al.
0

Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signal. Recently, neural networks based methods have been applied to speech enhancement. However, many neural network based methods require noisy and clean speech pairs for training. We propose a speech enhancement framework that can be trained with large-scale weakly labelled AudioSet dataset. Weakly labelled data only contain audio tags of audio clips, but not the onset or offset times of speech. We first apply pretrained audio neural networks (PANNs) to detect anchor segments that contain speech or sound events in audio clips. Then, we randomly mix two detected anchor segments containing speech and sound events as a mixture, and build a conditional source separation network using PANNs predictions as soft conditions for speech enhancement. In inference, we input a noisy speech signal with the one-hot encoding of "Speech" as a condition to the trained system to predict enhanced speech. Our system achieves a PESQ of 2.28 and an SSNR of 8.75 dB on the VoiceBank-DEMAND dataset, outperforming the previous SEGAN system of 2.16 and 7.73 dB respectively.

READ FULL TEXT
research
02/06/2020

Source separation with weakly labelled data: An approach to computational auditory scene analysis

Source separation is the task to separate an audio recording into indivi...
research
04/14/2022

RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System

Speech enhancement and separation have been a long-standing problem, esp...
research
11/15/2018

Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems

Humans tend to change their way of speaking when they are immersed in a ...
research
09/14/2023

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

Speech enhancement systems are typically trained using pairs of clean an...
research
09/04/2020

SEANet: A Multi-modal Speech Enhancement Network

We explore the possibility of leveraging accelerometer data to perform s...
research
09/04/2023

Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

In this paper, we propose to extend the deep, complex U-Network architec...
research
11/21/2020

Deep Network Perceptual Losses for Speech Denoising

Contemporary speech enhancement predominantly relies on audio transforms...

Please sign up or login with your details

Forgot password? Click here to reset