SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language

10/13/2022
by   Yehong Jiang, et al.
0

Despite tremendous progress in natural language processing using deep learning techniques in recent years, sign language production and comprehension has advanced very little. One critical barrier is the lack of largescale datasets available to the public due to the unbearable cost of labeled data generation. Efforts to provide public data for American Sign Language (ASL) comprehension have yielded two datasets, comprising more than thousand video clips. These datasets are large enough to enable a meaningful start to deep learning research on sign languages but are far too small to lead to any solution that can be practically deployed. So far, there is still no suitable dataset for ASL production. We proposed a system that can generate large scale ASL datasets for continuous ASL. It is suitable for general ASL processing and is particularly useful for ASL production. The continuous ASL dataset contains English labeled human articulations in condensed body pose data formats. To better serve the research community, we are releasing the first version of our ASL dataset, which contains 30k sentences, 416k words, a vocabulary of 18k words, in a total of 104 hours. This is the largest continuous sign language dataset published to date in terms of video duration. We also describe a system that can evolve and expand the dataset to incorporate better data processing techniques and more contents when available. It is our hope that the release of this ASL dataset and the sustainable dataset generation system to the public will propel better deep-learning research in ASL natural language processing.

READ FULL TEXT

page 6

page 7

research
10/24/2019

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Vision-based sign language recognition aims at helping the hearing-impai...
research
04/02/2020

BosphorusSign22k Sign Language Recognition Dataset

Sign Language Recognition is a challenging research domain. It has recen...
research
05/05/2021

Content4All Open Research Sign Language Translation Datasets

Computational sign language research lacks the large-scale datasets that...
research
10/12/2021

OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages

AI technologies for Natural Languages have made tremendous progress rece...
research
08/30/2023

SignDiff: Learning Diffusion Models for American Sign Language Production

The field of Sign Language Production (SLP) lacked a large-scale, pre-tr...
research
03/11/2022

WLASL-LEX: a Dataset for Recognising Phonological Properties in American Sign Language

Signed Language Processing (SLP) concerns the automated processing of si...
research
09/03/2020

Modeling Global Body Configurations in American Sign Language

American Sign Language (ASL) is the fourth most commonly used language i...

Please sign up or login with your details

Forgot password? Click here to reset