Generalisation – the ability of a model to perform well on unseen data –...
In this paper we introduce a first attempt on understanding how a
non-au...
Speaker embeddings represent a means to extract representative vectorial...
The task of converting text input into video content is becoming an impo...
Multi-speaker spoken datasets enable the creation of text-to-speech synt...
Building multispeaker neural network-based text-to-speech synthesis syst...
The task of video-to-speech aims to translate silent video of lip moveme...
Quantifying the confidence (or conversely the uncertainty) of a predicti...
Deep learning enables the development of efficient end-to-end speech
pro...