We propose UnitSpeech, a speaker-adaptive speech synthesis method that
f...
Despite the fact that text-to-video (TTV) model has recently achieved
re...
We propose Guided-TTS 2, a diffusion-based generative model for high-qua...
Most neural text-to-speech (TTS) models require <speech, transcript> pai...
Despite advances in neural network language model, the representation
de...
Denoising diffusion probabilistic models have been recently proposed to
...
Generative adversarial networks (GANs) with clustered latent spaces can
...