Recent work in the field of speech enhancement (SE) has involved the use...
Self-supervised speech representations (SSSRs) have been successfully ap...
Speech emotion recognition (SER) is vital for obtaining emotional
intell...
Exploiting cross-lingual resources is an effective way to compensate for...
Speech separation remains an important area of multi-speaker signal
proc...
Knowledge distillation has widely been used for model compression and do...
Recent work in the domain of speech enhancement has explored the use of
...
State-of-the-art speaker verification frameworks have typically focused ...
End-to-End automatic speech recognition (ASR) models aim to learn a
gene...
Speech separation models are used for isolating individual speakers in m...
This paper proposes an unsupervised data selection method by using a
sub...
Multilingual speech recognition has drawn significant attention as an
ef...
Multilingual automatic speech recognition (ASR) systems mostly benefit l...
For speech emotion datasets, it has been difficult to acquire large
quan...
End-to-end automatic speech recognition (ASR) models aim to learn a
gene...
Speech dereverberation is an important stage in many speech technology
a...
Speech dereverberation is often an important requirement in robust speec...
Training of speech enhancement systems often does not incorporate knowle...
This paper proposes an adaptation method for end-to-end speech recogniti...
Identifying multiple speakers without knowing where a speaker's voice is...
Many-to-many voice conversion with non-parallel training data has seen
s...
Anomalous audio in speech recordings is often caused by speaker voice
di...
Unsupervised representation learning of speech has been of keen interest...
Many applications of speech technology require more and more audio data....
While the use of deep neural networks has significantly boosted speaker
...
Identifying multiple speakers without knowing where a speaker's voice is...
In this work, a speaker embedding de-mixing approach is proposed. Instea...
In this paper, a novel architecture for speaker recognition is proposed ...
In this paper, a hierarchical attention network to generate utterance-le...
Embedding acoustic information into fixed length representations is of
i...
In this paper a novel framework to tackle speaker recognition using a
tw...
Selecting in-domain data from a large pool of diverse and out-of-domain ...
Huge amounts of digital videos are being produced and broadcast every da...
We describe the University of Sheffield system for participation in the ...
This paper presents a new method for the discovery of latent domains in
...
The University of Sheffield (USFD) participated in the International Wor...
Speech recognition systems are often highly domain dependent, a fact wid...
Negative transfer in training of acoustic models for automatic speech
re...