Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

06/14/2023
by   Zheng Liang, et al.
0

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition. However, there remain several challenging scenarios that E2E models are not competent in, such as code-switching and named entity recognition (NER). Data augmentation is a common and effective practice for these two scenarios. However, the current data augmentation methods mainly rely on audio splicing and text-to-speech (TTS) models, which might result in discontinuous, unrealistic, and less diversified speech. To mitigate these potential issues, we propose a novel data augmentation method by applying the text-based speech editing model. The augmented speech from speech editing systems is more coherent and diversified, also more akin to real speech. The experimental results on code-switching and NER tasks show that our proposed method can significantly outperform the audio splicing and neural TTS based data augmentation systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2020

Data Augmentation for End-to-end Code-switching Speech Recognition

Training a code-switching end-to-end automatic speech recognition (ASR) ...
research
10/19/2022

EnTDA: Entity-to-Text based Data Augmentation Approach for Named Entity Recognition Tasks

Data augmentation techniques have been used to improve the generalizatio...
research
05/22/2020

End-to-end Named Entity Recognition from English Speech

Named entity recognition (NER) from text has been a widely studied probl...
research
02/17/2022

AISHELL-NER: Named Entity Recognition from Chinese Speech

Named Entity Recognition (NER) from speech is among Spoken Language Unde...
research
03/17/2019

Audio De-identification: A New Entity Recognition Task

Named Entity Recognition (NER) has been mostly studied in the context of...
research
05/30/2022

Adversarial synthesis based data-augmentation for code-switched spoken language identification

Spoken Language Identification (LID) is an important sub-task of Automat...
research
02/09/2023

Data Augmentation for Robust Character Detection in Fantasy Novels

Named Entity Recognition (NER) is a low-level task often used as a found...

Please sign up or login with your details

Forgot password? Click here to reset