Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation

10/13/2022
by   Jinhui Ye, et al.
0

Sign language gloss translation aims to translate the sign glosses into spoken language texts, which is challenging due to the scarcity of labeled gloss-text parallel data. Back translation (BT), which generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses, has been applied to alleviate the data scarcity problem. However, the lack of large-scale high-quality domain spoken language text data limits the effect of BT. In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data. Specifically, PGEN randomly concatenates sentences from the original in-domain spoken language text data as prompts to induce a pre-trained language model (i.e., GPT-2) to generate spoken language texts in a similar style. Experimental results on three benchmarks of sign language gloss translation in varied languages demonstrate that BT with spoken language texts generated by PGEN significantly outperforms the compared methods. In addition, as the scale of spoken language texts generated by PGEN increases, the BT technique can achieve further improvements, demonstrating the effectiveness of our approach. We release the code and data for facilitating future research in this field.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

Cross-modality Data Augmentation for End-to-End Sign Language Translation

End-to-end sign language translation (SLT) aims to convert sign language...
research
05/26/2021

Improving Sign Language Translation with Monolingual Data by Sign Back-Translation

Despite existing pioneering works on sign language translation (SLT), th...
research
03/08/2022

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation

This paper proposes a simple transfer learning baseline for sign languag...
research
05/11/2023

The First Parallel Corpora for Kurdish Sign Language

Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf...
research
11/24/2022

Ham2Pose: Animating Sign Language Notation into Pose Sequences

Translating spoken languages into Sign languages is necessary for open c...
research
01/23/2018

What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text

We describe a large, high-quality benchmark for the evaluation of Mentio...
research
04/30/2020

Progressive Transformers for End-to-End Sign Language Production

The goal of automatic Sign Language Production (SLP) is to translate spo...

Please sign up or login with your details

Forgot password? Click here to reset