Multilingual context-based pronunciation learning for Text-to-Speech

07/31/2023
by   Giulia Comini, et al.
0

Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to correct pronunciation within or between words. In this work we showcase a multilingual unified front-end system that addresses any pronunciation related task, typically handled by separate modules. We evaluate the proposed model on G2P conversion and other language-specific challenges, such as homograph and polyphones disambiguation, post-lexical rules and implicit diacritization. We find that the multilingual model is competitive across languages and tasks, however, some trade-offs exists when compared to equivalent monolingual solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2018

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio ...
research
05/07/2021

Generalising Multilingual Concept-to-Text NLG with Language Agnostic Delexicalisation

Concept-to-text Natural Language Generation is the task of expressing an...
research
11/06/2017

Towards Language-Universal End-to-End Speech Recognition

Building speech recognizers in multiple languages typically involves rep...
research
07/14/2021

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

In this paper, we describe our end-to-end multilingual speech translatio...
research
05/14/2019

Multilingual Factor Analysis

In this work we approach the task of learning multilingual word represen...
research
05/27/2022

UAlberta at SemEval 2022 Task 2: Leveraging Glosses and Translations for Multilingual Idiomaticity Detection

We describe the University of Alberta systems for the SemEval-2022 Task ...
research
07/05/2023

Multilingual Controllable Transformer-Based Lexical Simplification

Text is by far the most ubiquitous source of knowledge and information a...

Please sign up or login with your details

Forgot password? Click here to reset