Large pre-trained speech models are widely used as the de-facto paradigm...
We introduce the Universal Speech Model (USM), a single large model that...
Research on speech-to-speech translation (S2ST) has progressed rapidly i...
Building inclusive speech recognition systems is a crucial step towards
...
We present a simple and effective self-supervised learning approach for
...
This work introduces cross-attention conformer, an attention-based
archi...
We summarize the results of a host of efforts using giant automatic spee...
Motivated by the success of masked language modeling (MLM) in pre-traini...
Streaming end-to-end automatic speech recognition (ASR) systems are wide...
We combine recent advancements in end-to-end speech recognition to
non-a...
End-to-end (E2E) models have shown to outperform state-of-the-art
conven...
Knowledge Distillation is an effective method of transferring knowledge ...
End-to-end (E2E) automatic speech recognition (ASR) models, by now, have...
Streaming end-to-end automatic speech recognition (ASR) models are widel...
Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
We employ a combination of recent developments in semi-supervised learni...
Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
Recent advances of end-to-end models have outperformed conventional mode...
Recently, a semi-supervised learning method known as "noisy student trai...
Recently Transformer and Convolution neural network (CNN) based models h...
In recent years, all-neural end-to-end approaches have obtained
state-of...
Convolutional neural networks (CNN) have shown promising results for
end...
Thus far, end-to-end (E2E) models have not been shown to outperform
stat...
Recently, SpecAugment, an augmentation scheme for automatic speech
recog...
In this paper, we propose to use pre-trained features from end-to-end AS...
End-to-end automatic speech recognition (ASR) models, including both
att...
All-neural end-to-end (E2E) automatic speech recognition (ASR) systems t...
The requirements for many applications of state-of-the-art speech recogn...
Simultaneous machine translation begins to translate each source sentenc...
We present SpecAugment, a simple data augmentation method for speech
rec...
Lingvo is a Tensorflow framework offering a complete solution for
collab...
End-to-end Speech Translation (ST) models have many potential advantages...
Attention-based recurrent neural encoder-decoder models present an elega...
Sequence-to-sequence models with soft attention have been successfully
a...
For decades, context-dependent phonemes have been the dominant sub-word ...
Sequence-to-sequence models, such as attention-based models in automatic...
Having a sequence-to-sequence model which can operate in an online fashi...
Attention-based encoder-decoder architectures such as Listen, Attend, an...
In this paper we document our experiences with developing speech recogni...
Generative models have long been the dominant approach for speech
recogn...
There has recently been significant interest in hard attention models fo...
Sequence-to-sequence models with soft attention had significant success ...