In this paper, we present a novel method for phoneme-level prosody contr...
A large part of the expressive speech synthesis literature focuses on
le...
This paper proposes an Expressive Speech Synthesis model that utilizes
t...
The gender of a voice assistant or any voice user interface is a central...
Current state-of-the-art methods for automatic synthetic speech evaluati...
This paper presents a method for end-to-end cross-lingual text-to-speech...
A text-to-speech (TTS) model typically factorizes speech attributes such...
Existing singing voice synthesis models (SVS) are usually trained on sin...
Voice cloning is a difficult task which requires robust and informative
...
In this work, we present the SOMOS dataset, the first large-scale mean
o...
This paper presents a method for controlling the prosody at the phoneme ...
This paper presents a method for phoneme-level prosody control of F0 and...
In this paper, a text-to-rapping/singing system is introduced, which can...
The idea of using phonological features instead of phonemes as input to
...
This paper presents an end-to-end text-to-speech system with low latency...