A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks

10/21/2020
by   Yun Tang, et al.
0

Attention-based sequence-to-sequence modeling provides a powerful and elegant solution for applications that need to map one sequence to a different sequence. Its success heavily relies on the availability of large amounts of training data. This presents a challenge for speech applications where labelled speech data is very expensive to obtain, such as automatic speech recognition (ASR) and speech translation (ST). In this study, we propose a general multi-task learning framework to leverage text data for ASR and ST tasks. Two auxiliary tasks, a denoising autoencoder task and machine translation task, are proposed to be co-trained with ASR and ST tasks respectively. We demonstrate that representing text input as phoneme sequences can reduce the difference between speech and text inputs, and enhance the knowledge transfer from text corpora to the speech to text tasks. Our experiments show that the proposed method achieves a relative 10 15 Librispeech task, and improves the speech translation quality on the MuST-C tasks by 4.2 11.1 BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2021

Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer

Transformers have recently become very popular for sequence-to-sequence ...
research
12/10/2021

Sequence-level self-learning with multiple hypotheses

In this work, we develop new self-learning techniques with an attention-...
research
07/12/2021

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Pretraining and multitask learning are widely used to improve the speech...
research
03/02/2021

Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD

When we use End-to-end automatic speech recognition (E2E-ASR) system for...
research
12/19/2022

Mu^2SLAM: Multitask, Multilingual Speech and Language Models

We present Mu^2SLAM, a multilingual sequence-to-sequence model pre-train...
research
10/11/2022

CTC Alignments Improve Autoregressive Translation

Connectionist Temporal Classification (CTC) is a widely used approach fo...
research
04/05/2016

Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project

The paper steps outside the comfort-zone of the traditional NLP tasks li...

Please sign up or login with your details

Forgot password? Click here to reset