Automatic Speech Recognition (ASR) models need to be optimized for speci...
Large language models have proven themselves highly flexible, able to so...
Large-scale generative models such as GPT and DALL-E have revolutionized...
This paper presents a method for selecting appropriate synthetic speech
...
State space models (SSMs) have recently shown promising results on
small...
Interactive voice assistants have been widely used as input interfaces i...
Neural transducers have gained popularity in production ASR systems,
ach...
The two most popular loss functions for streaming end-to-end automatic s...
Cross-device federated learning (FL) protects user privacy by collaborat...
Streaming ASR with strict latency constraints is required in many speech...
This document describes version 0.10 of torchaudio: building blocks for
...
This paper improves the streaming transformer transducer for speech
reco...
Often, the storage and computational constraints of embeddeddevices dema...
As speech-enabled devices such as smartphones and smart speakers become
...
How to leverage dynamic contextual information in end-to-end speech
reco...
We propose a dynamic encoder transducer (DET) for on-device speech
recog...
Recurrent transducer models have emerged as a promising solution for spe...
End-to-end models in general, and Recurrent Neural Network Transducer (R...
There is a growing interest in the speech community in developing Recurr...
Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech...
End-to-end (E2E) systems for automatic speech recognition (ASR), such as...
In this paper, we introduce spatial attention for refining the informati...
Neural transducer-based systems such as RNN Transducers (RNN-T) for auto...
We explore options to use Transformer networks in neural transducer for
...
We propose and evaluate transformer-based acoustic models (AMs) for hybr...