Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

04/25/2023
by   Chenpeng Du, et al.
0

In this paper, we describe the systems developed by the SJTU X-LANCE team for LIMMITS 2023 Challenge, and we mainly focus on the winning system on naturalness for track 1. The aim of this challenge is to build a multi-speaker multi-lingual text-to-speech (TTS) system for Marathi, Hindi and Telugu. Each of the languages has a male and a female speaker in the given dataset. In track 1, only 5 hours data from each speaker can be selected to train the TTS model. Our system is based on the recently proposed VQTTS that utilizes VQ acoustic feature rather than mel-spectrogram. We introduce additional speaker embeddings and language embeddings to VQTTS for controlling the speaker and language information. In the cross-lingual evaluations where we need to synthesize speech in a cross-lingual speaker's voice, we provide a native speaker's embedding to the acoustic model and the target speaker's embedding to the vocoder. In the subjective MOS listening test on naturalness, our system achieves 4.77 which ranks first.

READ FULL TEXT

page 1

page 2

research
02/28/2023

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

While recent text-to-speech (TTS) systems have made remarkable strides t...
research
11/07/2022

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Speech representation learning has improved both speech understanding an...
research
07/12/2020

NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling

Many commercial and forensic applications of speech demand the extractio...
research
11/12/2020

Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

Recent neural Text-to-Speech (TTS) models have been shown to perform ver...
research
09/24/2022

NWPU-ASLP System for the VoicePrivacy 2022 Challenge

This paper presents the NWPU-ASLP speaker anonymization system for Voice...
research
02/22/2022

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Recent advances in cross-lingual text-to-speech (TTS) made it possible t...
research
01/20/2022

Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

In cross-lingual speech synthesis, the speech in various languages can b...

Please sign up or login with your details

Forgot password? Click here to reset