MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

04/07/2022
by   Ryandhimas E. Zezario, et al.
5

Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2023

Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Automated assessment of speech intelligibility in hearing aid (HA) devic...
research
08/18/2023

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

This study proposes a multi-task pseudo-label learning (MPL)-based non-i...
research
04/07/2022

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Improving the user's hearing ability to understand speech in noisy envir...
research
11/03/2021

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

In this study, we propose a cross-domain multi-objective speech assessme...
research
06/01/2023

How to Estimate Model Transferability of Pre-Trained Speech Models?

In this work, we introduce a “score-based assessment” framework for esti...
research
11/09/2020

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

The calculation of most objective speech intelligibility assessment metr...
research
08/26/2018

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

This paper addresses a relatively new task: prediction of ASR performanc...

Please sign up or login with your details

Forgot password? Click here to reset