We compare using a PHOIBLE-based phone mapping method and using phonolog...
We compare phone labels and articulatory features as input for cross-lin...
We train a MOS prediction model based on wav2vec 2.0 using the open-acce...
Cross-lingual synthesis can be defined as the task of letting a speaker
...
Recent advances in neural TTS have led to models that can produce
high-q...