Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

11/09/2020
by   Xiaohui Zhang, et al.
0

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T. In transcribing social media videos of 7 languages with training data 3K-14K hours, we conduct large-scale controlled experimentation across each criterion using identical datasets and encoder model architecture. We find that RNN-T has consistent wins in ASR accuracy, while CTC models excel at inference efficiency. Moreover, we selectively examine various modeling strategies for different training criteria, including modeling units, encoder architectures, pre-training, etc. Given such large-scale real-world streaming ASR application, to our best knowledge, we present the first comprehensive benchmark on these three widely used training criteria across a great many languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

Recently, there has been a strong push to transition from hybrid models ...
research
08/12/2020

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Transfer learning (TL) is widely used in conventional hybrid automatic s...
research
11/05/2020

Improving RNN Transducer Based ASR with Auxiliary Tasks

End-to-end automatic speech recognition (ASR) models with a single neura...
research
09/09/2023

Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

Achieving high accuracy with low latency has always been a challenge in ...
research
04/19/2022

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

The two most popular loss functions for streaming end-to-end automatic s...
research
04/02/2021

HMM-Free Encoder Pre-Training for Streaming RNN Transducer

This work describes an encoder pre-training procedure using frame-wise l...
research
06/29/2022

On the Prediction Network Architecture in RNN-T for ASR

RNN-T models have gained popularity in the literature and in commercial ...

Please sign up or login with your details

Forgot password? Click here to reset