Speech MOS multi-task learning and rater bias correction

12/04/2022
by   Haleh Akrami, et al.
0

Perceptual speech quality is an important performance metric for teleconferencing applications. The mean opinion score (MOS) is standardized for the perceptual evaluation of speech quality and is obtained by asking listeners to rate the quality of a speech sample. Recently, there has been increasing research interest in developing models for estimating MOS blindly. Here we propose a multi-task framework to include additional labels and data in training to improve the performance of a blind MOS estimation model. Experimental results indicate that the proposed model can be trained to jointly estimate MOS, reverberation time (T60), and clarity (C50) by combining two disjoint data sets in training, one containing only MOS labels and the other containing only T60 and C50 labels. Furthermore, we use a semi-supervised framework to combine two MOS data sets in training, one containing only MOS labels (per ITU-T Recommendation P.808), and the other containing separate scores for speech signal, background noise, and overall quality (per ITU-T Recommendation P.835). Finally, we present preliminary results for addressing individual rater bias in the MOS labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2023

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

This study proposes a multi-task pseudo-label learning (MPL)-based non-i...
research
04/13/2022

Predicting score distribution to improve non-intrusive speech quality estimation

Deep noise suppressors (DNS) have become an attractive solution to remov...
research
11/04/2021

InQSS: a speech intelligibility assessment model using a multi-task learning network

Speech intelligibility assessment models are essential tools for researc...
research
10/18/2021

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Personalized speech enhancement (PSE) models utilize additional cues, su...
research
07/13/2019

Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion

Despite the widespread use of supervised deep learning methods for affec...
research
10/23/2019

Semi-supervised Multi-domain Multi-task Training for Metastatic Colon Lymph Node Diagnosis From Abdominal CT

The diagnosis of the presence of metastatic lymph nodes from abdominal c...

Please sign up or login with your details

Forgot password? Click here to reset