Jointly Trained Transformers models for Spoken Language Translation

04/25/2020
by   Hari Krishna Vydana, et al.
0

Conventional spoken language translation (SLT) systems are pipeline based systems, where we have an Automatic Speech Recognition (ASR) system to convert the modality of source from speech to text and a Machine Translation (MT) systems to translate source text to text in target language. Recent progress in the sequence-sequence architectures have reduced the performance gap between the pipeline based SLT systems (cascaded ASR-MT) and End-to-End approaches. Though End-to-End and cascaded ASR-MT systems are reaching to the comparable levels of performances, we can see a large performance gap using the ASR hypothesis and oracle text w.r.t MT models. This performance gap indicates that the MT systems are prone to large performance degradation due to noisy ASR hypothesis as opposed to oracle text transcript. In this work this degradation in the performance is reduced by creating an end to-end differentiable pipeline between the ASR and MT systems. In this work, we train SLT systems with ASR objective as an auxiliary loss and both the networks are connected through the neural hidden representations. This train ing would have an End-to-End differentiable path w.r.t to the final objective function as well as utilize the ASR objective for better performance of the SLT systems. This architecture has improved from BLEU from 36.8 to 44.5. Due to the Multi-task training the model also generates the ASR hypothesis which are used by a pre-trained MT model. Combining the proposed systems with the MT model has increased the BLEU score by 1. All the experiments are reported on English-Portuguese speech translation task using How2 corpus. The final BLEU score is on-par with the best speech translation system on How2 dataset with no additional training data and language model and much less parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

A Technical Report: BUT Speech Translation Systems

The paper describes the BUT's speech translation systems. The systems ar...
research
12/11/2022

End-to-End Speech Translation of Arabic to English Broadcast News

Speech translation (ST) is the task of directly translating acoustic spe...
research
10/18/2022

Simultaneous Translation for Unsegmented Input: A Sliding Window Approach

In the cascaded approach to spoken language translation (SLT), the ASR o...
research
09/14/2019

Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade

For automatic speech translation (AST), end-to-end approaches are outper...
research
11/11/2019

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

End-to-end Speech Translation (ST) models have several advantages such a...
research
10/30/2019

ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

This paper describes the ON-TRAC Consortium translation systems develope...
research
06/07/2022

LegoNN: Building Modular Encoder-Decoder Models

State-of-the-art encoder-decoder models (e.g. for machine translation (M...

Please sign up or login with your details

Forgot password? Click here to reset