Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR

04/10/2021
by   Fan Yu, et al.
0

Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic alignment mechanism, have been well applied in non-autoregressive (NAR) speech recognition and achieved competitive performance compared with other NAR methods. However, such an alignment learning strategy may also result in inaccurate acoustic boundary estimation and deceleration in convergence speed. To eliminate these drawbacks and improve performance further, we incorporate an additional connectionist temporal classification (CTC) based alignment loss and a contextual decoder into the CIF-based NAR model. Specifically, we use the CTC spike information to guide the leaning of acoustic boundary and adopt a new contextual decoder to capture the linguistic dependencies within a sentence in the conventional CIF model. Besides, a recently proposed Conformer architecture is also employed to model both local and global acoustic dependencies. Experiments on the open-source Mandarin corpora AISHELL-1 show that the proposed method achieves a comparable character error rate (CER) of 4.9 state-of-the-art autoregressive (AR) Conformer model.

READ FULL TEXT
research
05/27/2019

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

Automatic speech recognition (ASR) system is undergoing an exciting path...
research
04/15/2023

A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition

Recently, end-to-end models have been widely used in automatic speech re...
research
10/28/2020

CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition

We propose a CTC alignment-based single step non-autoregressive transfor...
research
07/02/2022

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Transformer-based models have demonstrated their effectiveness in automa...
research
01/29/2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Conventional ASR systems use frame-level phoneme posterior to conduct fo...
research
12/13/2022

Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

Phoneme boundary detection has been studied due to its central role in v...
research
05/25/2022

Improving CTC-based ASR Models with Gated Interlayer Collaboration

For Automatic Speech Recognition (ASR), the CTC-based methods have becom...

Please sign up or login with your details

Forgot password? Click here to reset