Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion

09/30/2020
by   Che-Jui Chang, et al.
0

Cross-lingual voice conversion (VC) is a task that aims to synthesize target voices with the same content while source and target speakers speak in different languages. Its challenge lies in the fact that the source and target data are naturally non-parallel, and it is even difficult to bridge the gaps between languages with no transcriptions provided. In this paper, we focus on knowledge transfer from monolin-gual ASR to cross-lingual VC, in order to address the con-tent mismatch problem. To achieve this, we first train a monolingual acoustic model for the source language, use it to extract phonetic features for all the speech in the VC dataset, and then train a Seq2Seq conversion model to pre-dict the mel-spectrograms. We successfully address cross-lingual VC without any transcription or language-specific knowledge for foreign speech. We experiment this on Voice Conversion Challenge 2020 datasets and show that our speaker-dependent conversion model outperforms the zero-shot baseline, achieving MOS of 3.83 and 3.54 in speech quality and speaker similarity for cross-lingual conversion. When compared to Cascade ASR-TTS method, our proposed one significantly reduces the MOS drop be-tween intra- and cross-lingual conversion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

Cross-lingual voice conversion aims to change source speaker's voice to ...
research
12/28/2020

Building Multi lingual TTS using Cross Lingual Voice Conversion

In this paper we propose a new cross-lingual Voice Conversion (VC) appro...
research
08/28/2020

Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

The voice conversion challenge is a bi-annual scientific event held to c...
research
10/08/2020

FastVC: Fast Voice Conversion with non-parallel data

This paper introduces FastVC, an end-to-end model for fast Voice Convers...
research
07/04/2022

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

In this paper, we propose GlowVC: a multilingual multi-speaker flow-base...
research
10/06/2020

The Academia Sinica Systems of Voice Conversion for VCC2020

This paper describes the Academia Sinica systems for the two tasks of Vo...
research
11/12/2021

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

We present a method for cross-lingual training an ASR system using absol...

Please sign up or login with your details

Forgot password? Click here to reset