Ordered Pooling of Optical Flow Sequences for Action Recognition

01/12/2017
by   Jue Wang, et al.
0

Training of Convolutional Neural Networks (CNNs) on long video sequences is computationally expensive due to the substantial memory requirements and the massive number of parameters that deep architectures demand. Early fusion of video frames is thus a standard technique, in which several consecutive frames are first agglomerated into a compact representation, and then fed into the CNN as an input sample. For this purpose, a summarization approach that represents a set of consecutive RGB frames by a single dynamic image to capture pixel dynamics is proposed recently. In this paper, we introduce a novel ordered representation of consecutive optical flow frames as an alternative and argue that this representation captures the action dynamics more effectively than RGB frames. We provide intuitions on why such a representation is better for action recognition. We validate our claims on standard benchmark datasets and demonstrate that using summaries of flow images lead to significant improvements over RGB frames while achieving accuracy comparable to the state-of-the-art on UCF101 and HMDB datasets.

READ FULL TEXT

page 1

page 4

page 8

research
11/29/2017

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

Motion representation plays a vital role in human action recognition in ...
research
12/02/2016

Action Recognition with Dynamic Image Networks

We introduce the concept of "dynamic image", a novel compact representat...
research
06/14/2017

Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

Webly-supervised learning has recently emerged as an alternative paradig...
research
11/24/2016

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

We propose a novel method for temporally pooling frames in a video for t...
research
01/16/2020

Rethinking Motion Representation: Residual Frames with 3D ConvNets for Better Action Recognition

Recently, 3D convolutional networks yield good performance in action rec...
research
06/21/2020

Motion Representation Using Residual Frames with 3D CNN

Recently, 3D convolutional networks (3D ConvNets) yield good performance...
research
05/16/2018

Fast Retinomorphic Event Stream for Video Recognition and ReinforcementLearning

Good temporal representations are crucial for video understanding, and t...

Please sign up or login with your details

Forgot password? Click here to reset