Modeling sounds emitted from physical object interactions is critical fo...
Are end-to-end text-to-speech (TTS) models over-parametrized? To what ex...
Prosody plays an important role in characterizing the style of a speaker...
Recent work on speech self-supervised learning (speech SSL) demonstrated...
Contemporary speech enhancement predominantly relies on audio transforms...
Speech information can be roughly decomposed into four components: langu...
Non-parallel many-to-many voice conversion remains an interesting but
ch...
There are two major paradigms of white-box adversarial attacks that atte...
Non-parallel many-to-many voice conversion, as well as zero-shot voice
c...
Non-parallel many-to-many voice conversion, as well as zero-shot voice
c...
Multi-channel speech enhancement with ad-hoc sensors has been a challeng...