First-Take-All: Temporal Order-Preserving Hashing for 3D Action Videos

by   Jun Ye, et al.

With the prevalence of the commodity depth cameras, the new paradigm of user interfaces based on 3D motion capturing and recognition have dramatically changed the way of interactions between human and computers. Human action recognition, as one of the key components in these devices, plays an important role to guarantee the quality of user experience. Although the model-driven methods have achieved huge success, they cannot provide a scalable solution for efficiently storing, retrieving and recognizing actions in the large-scale applications. These models are also vulnerable to the temporal translation and warping, as well as the variations in motion scales and execution rates. To address these challenges, we propose to treat the 3D human action recognition as a video-level hashing problem and propose a novel First-Take-All (FTA) Hashing algorithm capable of hashing the entire video into hash codes of fixed length. We demonstrate that this FTA algorithm produces a compact representation of the video invariant to the above mentioned variations, through which action recognition can be solved by an efficient nearest neighbor search by the Hamming distance between the FTA hash codes. Experiments on the public 3D human action datasets shows that the FTA algorithm can reach a recognition accuracy higher than 80 there are 65 frames per video over the datasets.


page 1

page 2

page 3

page 4


Human Action Recognition without Human

The objective of this paper is to evaluate "human action recognition wit...

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition

This paper performs the first investigation into depth for large-scale h...

Evaluating Transformers for Lightweight Action Recognition

In video action recognition, transformers consistently reach state-of-th...

A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications

To understand human behaviors, action recognition based on videos is a c...

Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web

Recently, attempts have been made to collect millions of videos to train...

Video Segment Copy Detection Using Memory Constrained Hierarchical Batch-Normalized LSTM Autoencoder

In this report, we introduce a video hashing method for scalable video s...

Privacy-Preserving Action Recognition using Coded Aperture Videos

The risk of unauthorized remote access of streaming video from networked...

Please sign up or login with your details

Forgot password? Click here to reset