Attention-Driven Body Pose Encoding for Human Activity Recognition

09/29/2020
by   B Debnath, et al.
3

This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve this encoding, the approach exploits 1) a spatial stream which encodes the spatial relationship between various body joints at each time point to learn spatial structure involving the spatial distribution of different body joints 2) a temporal stream that learns the temporal variation of individual body joints over the entire sequence duration to present a temporally enhanced representation. Afterwards, these two pose streams are fused with a multi-head attention mechanism. translation. We also capture the contextual information from the RGB video stream using a Inception-ResNet-V2 model combined with a multi-head attention and a bidirectional Long Short-Term Memory (LSTM) network. performance is enhanced through the multi-head attention mechanism. Finally, the RGB video stream is combined with the fused body pose stream to give a novel end-to-end deep model for effective human activity recognition.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro