Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

09/18/2023
by   Lester Phillip Violeta, et al.
0

We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conversion performance of this framework. To resolve this issue, we propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch, making it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16 improvement in character error rate and 0.83 improvement in naturalness score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2020

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

A novel framework for meeting transcription using asynchronous microphon...
research
07/11/2022

The HCCL System for the NIST SRE21

This paper describes the systems developed by the HCCL team for the NIST...
research
06/15/2022

End-to-End Voice Conversion with Information Perturbation

The ideal goal of voice conversion is to convert the source speaker's sp...
research
05/22/2020

NAUTILUS: a Versatile Voice Cloning System

We introduce a novel speech synthesis system, called NAUTILUS, that can ...
research
11/02/2022

Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition

Research on automatic speech recognition (ASR) systems for electrolaryng...
research
06/01/2023

Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

Self-Supervised Learning (SSL) has allowed leveraging large amounts of u...
research
04/07/2023

ArmanTTS single-speaker Persian dataset

TTS, or text-to-speech, is a complicated process that can be accomplishe...

Please sign up or login with your details

Forgot password? Click here to reset