A Teacher-student Framework for Unsupervised Speech Enhancement Using Noise Remixing Training and Two-stage Inference

10/27/2022
by   Li-Wei Chen, et al.
0

The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that the training of neural network models must be done in an unsupervised manner, and there is an inevitable mismatch between their training criterion and evaluation metric. In response to this unfavorable situation, we propose a teacher-student training strategy that does not require any subjective/objective speech quality metrics as learning reference by improving the previously proposed noisy-target training (NyTT). Because homogeneity between in-domain noise and extraneous noise is the key to the effectiveness of NyTT, we train various student models by remixing the teacher model's estimated speech and noise for clean-target training or raw noisy speech and the teacher model's estimated noise for noisy-target training. We use the NyTT model as the initial teacher model. Experimental results show that our proposed method outperforms several baselines, especially with two-stage inference, where clean speech is derived successively through the bootstrap model and the final student model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2021

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

Most of the deep learning-based speech enhancement models are learned in...
research
02/23/2021

Handling Background Noise in Neural Speech Generation

Recent advances in neural-network based generative modeling of speech ha...
research
02/17/2022

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

We present RemixIT, a simple yet effective self-supervised method for tr...
research
04/02/2019

Unsupervised training of neural mask-based beamforming

We present an unsupervised training approach for a neural network-based ...
research
11/28/2019

Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR

The state-of-art methods for acoustic beamforming in multi-channel ASR a...
research
10/19/2021

Continual self-training with bootstrapped remixing for speech enhancement

We propose RemixIT, a simple and novel self-supervised training method f...
research
06/18/2022

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

For deep learning-based speech enhancement (SE) systems, the training-te...

Please sign up or login with your details

Forgot password? Click here to reset