A text-to-speech (TTS) model typically factorizes speech attributes such...
Existing singing voice synthesis models (SVS) are usually trained on sin...
Voice cloning is a difficult task which requires robust and informative
...
In this work, we present the SOMOS dataset, the first large-scale mean
o...