Recent advancements in audio generation have been spurred by the evoluti...
In language modeling based music generation, a generated waveform is
rep...
This paper discusses the challenges of optical character recognition (OC...
Knowing exactly how many data points need to be labeled to achieve a cer...
Ternary and binary neural networks enable multiplication-free computatio...
Several post-training quantization methods have been applied to large
la...
State space models (SSMs) have recently shown promising results on
small...
This paper proposes a hardware-efficient architecture, Linearized Convol...
Self-supervised learning via masked prediction pre-training (MPPT) has s...
There is growing interest in unifying the streaming and full-context
aut...
Streaming ASR with strict latency constraints is required in many speech...
This document describes version 0.10 of torchaudio: building blocks for
...
This paper improves the streaming transformer transducer for speech
reco...
Detection of common events and scenes from audio is useful for extractin...
Hybrid automatic speech recognition (ASR) models are typically sequentia...
On-device speech recognition requires training models of different sizes...
Often, the storage and computational constraints of embeddeddevices dema...
As speech-enabled devices such as smartphones and smart speakers become
...
How to leverage dynamic contextual information in end-to-end speech
reco...
We propose a dynamic encoder transducer (DET) for on-device speech
recog...
The Bhatnagar-Gross-Krook (BGK) single-relaxation-time collision model f...
Attention-based models have been gaining popularity recently for their s...
In this paper, we summarize the application of transformer and its strea...
This paper proposes an efficient memory transformer Emformer for low lat...
Transformers, originally proposed for natural language processing (NLP)
...
Transformer-based acoustic modeling has achieved great suc-cess for both...
Recurrent Neural Networks (RNNs) have dominated language modeling becaus...
Long Short Term Memory Connectionist Temporal Classification (LSTM-CTC) ...