We work to create a multilingual speech synthesis system which can gener...
Despite recent advances in generative modeling for text-to-speech synthe...
Speech-to-text alignment is a critical component of neural textto-speech...
Relation extraction is the task of identifying relation instance between...
Sounds are essential to how humans perceive and interact with the world ...
The largest source of sound events is web videos. Most videos lack sound...
In this paper, we focus on the problem of content-based retrieval for au...
In this paper we present our work on Task 1 Acoustic Scene Classi- ficat...