Audio-attention discriminative language model for ASR rescoring

12/06/2019
by   Ankur Gandhe, et al.
0

End-to-end approaches for automatic speech recognition (ASR) benefit from directly modeling the probability of the word sequence given the input audio stream in a single neural network. However, compared to conventional ASR systems, these models typically require more data to achieve comparable results. Well-known model adaptation techniques, to account for domain and style adaptation, are not easily applicable to end-to-end systems. Conventional HMM-based systems, on the other hand, have been optimized for various production environments and use cases. In this work, we propose to combine the benefits of end-to-end approaches with a conventional system using an attention-based discriminative language model that learns to rescore the output of a first-pass ASR system. We show that learning to rescore a list of potential ASR outputs is much simpler than learning to generate the hypothesis. The proposed model results in 8 amount of training data is a fraction of data used for training the first-pass system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2019

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

Recent advances in text-to-speech (TTS) led to the development of flexib...
research
03/23/2021

Hallucination of speech recognition errors with sequence to sequence learning

Automatic Speech Recognition (ASR) is an imperfect process that results ...
research
02/19/2019

A spelling correction model for end-to-end speech recognition

Attention-based sequence-to-sequence models for speech recognition joint...
research
02/16/2023

Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

End-to-end (E2E) automatic speech recognition (ASR) implicitly learns th...
research
01/13/2017

End-to-End ASR-free Keyword Search from Speech

End-to-end (E2E) systems have achieved competitive results compared to c...
research
02/16/2021

End-to-End Automatic Speech Recognition with Deep Mutual Learning

This paper is the first study to apply deep mutual learning (DML) to end...
research
02/27/2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

We propose an end-to-end ASR system that can be trained on transcribed s...

Please sign up or login with your details

Forgot password? Click here to reset