Simplified End-to-End MMI Training and Voting for ASR

03/30/2017
by   Lior Fritz, et al.
0

A simplified speech recognition system that uses the maximum mutual information (MMI) criterion is considered. End-to-end training using gradient descent is suggested, similarly to the training of connectionist temporal classification (CTC). We use an MMI criterion with a simple language model in the training stage, and a standard HMM decoder. Our method compares favorably to CTC in terms of performance, robustness, decoding time, disk footprint and quality of alignments. The good alignments enable the use of a straightforward ensemble method, obtained by simply averaging the predictions of several neural network models, that were trained separately end-to-end. The ensemble method yields a considerable reduction in the word error rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
research
10/27/2020

Multitask Training with Text Data for End-to-End Speech Recognition

We propose a multitask training method for attention-based end-to-end sp...
research
03/03/2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR

End-to-end (E2E) automatic speech recognition (ASR) systems directly map...
research
12/11/2019

End-to-End Learning of Geometrical Shaping Maximizing Generalized Mutual Information

GMI-based end-to-end learning is shown to be highly nonconvex. We apply ...
research
10/23/2020

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
research
05/19/2020

Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion

Proper nouns present a challenge for end-to-end (E2E) automatic speech r...
research
10/30/2020

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

To join the advantages of classical and end-to-end approaches for speech...

Please sign up or login with your details

Forgot password? Click here to reset