Regional Attention with Architecture-Rebuilt 3D Network for RGB-D Gesture Recognition

by   Benjia Zhou, et al.

Human gesture recognition has drawn much attention in the area of computer vision. However, the performance of gesture recognition is always influenced by some gesture-irrelevant factors like the background and the clothes of performers. Therefore, focusing on the regions of hand/arm is important to the gesture recognition. Meanwhile, a more adaptive architecture-searched network structure can also perform better than the block-fixed ones like Resnet since it increases the diversity of features in different stages of the network better. In this paper, we propose a regional attention with architecture-rebuilt 3D network (RAAR3DNet) for gesture recognition. We replace the fixed Inception modules with the automatically rebuilt structure through the network via Neural Architecture Search (NAS), owing to the different shape and representation ability of features in the early, middle, and late stage of the network. It enables the network to capture different levels of feature representations at different layers more adaptively. Meanwhile, we also design a stackable regional attention module called dynamic-static Attention (DSA), which derives a Gaussian guidance heatmap and dynamic motion map to highlight the hand/arm regions and the motion information in the spatial and temporal domains, respectively. Extensive experiments on two recent large-scale RGB-D gesture datasets validate the effectiveness of the proposed method and show it outperforms state-of-the-art methods. The codes of our method are available at:


page 2

page 7


Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention

We propose a Dynamic Graph-Based Spatial-Temporal Attention (DG-STA) met...

Skeleton-Based Hand Gesture Recognition by Learning SPD Matrices with Neural Networks

In this paper, we propose a new hand gesture recognition method based on...

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

Gesture recognition is a hot topic in computer vision and pattern recogn...

Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition

Dynamic hand gesture recognition has attracted increasing interests beca...

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

RGB-D action and gesture recognition remain an interesting topic in huma...

Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

Recently, the popularity of depth-sensors such as Kinect has made depth ...

RDFNet: Regional Dynamic FISTA-Net for Spectral Snapshot Compressive Imaging

Deep convolutional neural networks have recently shown promising results...

Please sign up or login with your details

Forgot password? Click here to reset