3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

by   Zhao You, et al.

Recently, Conformer based CTC/AED model has become a mainstream architecture for ASR. In this paper, based on our prior work, we identify and integrate several approaches to achieve further improvements for ASR tasks, which we denote as multi-loss, multi-path and multi-level, summarized as "3M" model. Specifically, multi-loss refers to the joint CTC/AED loss and multi-path denotes the Mixture-of-Experts(MoE) architecture which can effectively increase the model capacity without remarkably increasing computation cost. Multi-level means that we introduce auxiliary loss at multiple level of a deep model to help training. We evaluate our proposed method on the public WenetSpeech dataset and experimental results show that the proposed method provides 12.2 toolkit. On our large scale dataset of 150k hours corpus, the 3M model has also shown obvious superiority over the baseline Conformer model. Code is publicly available at https://github.com/tencent-ailab/3m-asr.


page 1

page 2

page 3

page 4


Multi-Level Mesa

Multi-level Mesa is an extension to support the Python based Agents Base...

End-to-end Speech Recognition with Word-based RNN Language Models

This paper investigates the impact of word-based RNN language models (RN...

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Multilingual Automatic Speech Recognition (ASR) models have extended the...

From Social to Individuals: a Parsimonious Path of Multi-level Models for Crowdsourced Preference Aggregation

In crowdsourced preference aggregation, it is often assumed that all the...

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

State-of-the-art ASR systems have achieved promising results by modeling...

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Mixture-of-experts based acoustic models with dynamic routing mechanisms...

Please sign up or login with your details

Forgot password? Click here to reset