A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

02/23/2023
by   Zhepei Wang, et al.
0

In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presence or absence of speech activity and applying augmentations to speaker embeddings. By training under a multi-task learning setting, we empirically show that the proposed unified model obtains promising results on both personalized and non-personalized speech enhancement benchmarks and reaches similar performance to models that are trained specialized for either task. The strong performance of the proposed method demonstrates that the unified model is a more economical alternative compared to keeping separate task-specific models during inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2022

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Personalized speech enhancement (PSE) models achieve promising results c...
research
11/14/2022

The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement

With the advances in deep learning, speech enhancement systems benefited...
research
10/18/2021

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Personalized speech enhancement (PSE) models utilize additional cues, su...
research
11/04/2022

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net

Personalized speech enhancement (PSE), a process of estimating a clean t...
research
04/05/2021

Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification

Training personalized speech enhancement models is innately a no-shot le...
research
03/04/2022

Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

Active speaker detection and speech enhancement have become two increasi...
research
12/20/2018

A unified convolutional beamformer for simultaneous denoising and dereverberation

This paper proposes a method for estimating a convolutional beamformer t...

Please sign up or login with your details

Forgot password? Click here to reset