SLTUNET: A Simple Unified Model for Sign Language Translation

by   Biao Zhang, et al.

Despite recent successes with neural models for sign language translation (SLT), translation quality still lags behind spoken languages because of the data scarcity and modality gap between sign video and text. To address both problems, we investigate strategies for cross-modality representation sharing for SLT. We propose SLTUNET, a simple unified neural model designed to support multiple SLTrelated tasks jointly, such as sign-to-gloss, gloss-to-text and sign-to-text translation. Jointly modeling different tasks endows SLTUNET with the capability to explore the cross-task relatedness that could help narrow the modality gap. In addition, this allows us to leverage the knowledge from external resources, such as abundant parallel data used for spoken-language machine translation (MT). We show in experiments that SLTUNET achieves competitive and even state-of-the-art performance on PHOENIX-2014T and CSL-Daily when augmented with MT data and equipped with a set of optimization techniques. We further use the DGS Corpus for end-to-end SLT for the first time. It covers broader domains with a significantly larger vocabulary, which is more challenging and which we consider to allow for a more realistic assessment of the current state of SLT than the former two. Still, SLTUNET obtains improved results on the DGS Corpus. Code is available at


page 1

page 2

page 3

page 4


Cross-modality Data Augmentation for End-to-End Sign Language Translation

End-to-end sign language translation (SLT) aims to convert sign language...

Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

This paper presents work on novel machine translation (MT) systems betwe...

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation

This paper proposes a simple transfer learning baseline for sign languag...

Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining

Sign Language Translation (SLT) is a challenging task due to its cross-d...

Improving American Sign Language Recognition with Synthetic Data

There is a need for real-time communication between the deaf and hearing...

Gloss-Free End-to-End Sign Language Translation

In this paper, we tackle the problem of sign language translation (SLT) ...

Video and Text Matching with Conditioned Embeddings

We present a method for matching a text sentence from a given corpus to ...

Please sign up or login with your details

Forgot password? Click here to reset