Audio Deepfake Detection (ADD) aims to detect the fake audio generated b...
Accented text-to-speech (TTS) synthesis seeks to generate speech with an...
Conversational Text-to-Speech (TTS) aims to synthesis an utterance with ...
Multimodal emotion recognition leverages complementary information acros...
Cyrillic and Traditional Mongolian are the two main members of the Mongo...
This paper introduces a high-quality open-source text-to-speech (TTS)
sy...
Accented text-to-speech (TTS) synthesis seeks to generate speech with an...
Emotion classification of speech and assessment of the emotion strength ...
Deep learning has shown a great potential for speech separation, especia...
Tacotron-based end-to-end speech synthesis has shown remarkable voice
qu...
We propose a novel training strategy for Tacotron-based text-to-speech (...
In the image inpainting task, the ability to repair both high-frequency ...
It is very challenging for speech enhancement methods to achieves robust...
In single-channel speech enhancement, methods based on full-band spectra...
Tacotron-based text-to-speech (TTS) systems directly synthesize speech f...
While neural end-to-end text-to-speech (TTS) is superior to conventional...
Both reverberation and additive noises degrade the speech quality and
in...