Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

by   Hao Zhou, et al.

Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and inter-cue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8


Multi-View Spatial-Temporal Network for Continuous Sign Language Recognition

Sign language is a beautiful visual language and is also the primary lan...

Temporal superimposed crossover module for effective continuous sign language

The ultimate goal of continuous sign language recognition(CSLR) is to fa...

Improving Continuous Sign Language Recognition with Consistency Constraints and Signer Removal

Most deep-learning-based continuous sign language recognition (CSLR) mod...

SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding

Hand gesture serves as a crucial role during the expression of sign lang...

Skeleton-based Gesture Recognition Using Several Fully Connected Layers with Path Signature Features and Temporal Transformer Module

The skeleton based gesture recognition is gaining more popularity due to...

A Dynamic Spatial-temporal Attention Network for Early Anticipation of Traffic Accidents

Recently, autonomous vehicles and those equipped with an Advanced Driver...

Temporal Lift Pooling for Continuous Sign Language Recognition

Pooling methods are necessities for modern neural networks for increasin...

Please sign up or login with your details

Forgot password? Click here to reset