ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

06/15/2021
by   Ashish Shenoy, et al.
4

Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive that cross utterance contextual cues play an important role in disambiguating domain specific content words from speech. In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM) to rescore ASR N-best hypotheses. To improve contextualization, we utilize turn level dialogue acts along with cross utterance context carry over. Additionally, to adapt our domain-general NLM towards e-commerce on-the-fly, we use embeddings derived from a finetuned masked LM on in-domain data. Finally, to improve robustness towards in-domain content words, we propose a multi-task model that can jointly perform content word detection and language modeling tasks. Compared to a non-contextual LSTM LM baseline, our best performing NLM rescorer results in a content WER reduction of 19.2 F1 improvement of 6.4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2020

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

End-to-end automatic speech recognition (ASR) systems are increasingly p...
research
06/02/2021

Attention-based Contextual Language Model Adaptation for Speech Recognition

Language modeling (LM) for automatic speech recognition (ASR) does not u...
research
04/21/2021

Adapting Long Context NLM for ASR Rescoring in Conversational Agents

Neural Language Models (NLM), when trained and evaluated with context sp...
research
01/28/2020

Joint Contextual Modeling for ASR Correction and Language Understanding

The quality of automatic speech recognition (ASR) is critical to Dialogu...
research
11/05/2021

Conversational speech recognition leveraging effective fusion methods for cross-utterance language modeling

Conversational speech normally is embodied with loose syntactic structur...
research
02/12/2021

Transformer Language Models with LSTM-based Cross-utterance Information Representation

The effective incorporation of cross-utterance information has the poten...
research
10/30/2022

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

In embedding-matching acoustic-to-word (A2W) ASR, every word in the voca...

Please sign up or login with your details

Forgot password? Click here to reset