Spatio-Temporal Fusion Networks for Action Recognition

06/17/2019
by   Sangwoo Cho, et al.
0

The video based CNN works have focused on effective ways to fuse appearance and motion networks, but they typically lack utilizing temporal information over video frames. In this work, we present a novel spatio-temporal fusion network (STFN) that integrates temporal dynamics of appearance and motion information from entire videos. The captured temporal dynamic information is then aggregated for a better video level representation and learned via end-to-end training. The spatio-temporal fusion network consists of two set of Residual Inception blocks that extract temporal dynamics and a fusion connection for appearance and motion features. The benefits of STFN are: (a) it captures local and global temporal dynamics of complementary data to learn video-wide information; and (b) it is applicable to any network for video classification to boost performance. We explore a variety of design choices for STFN and verify how the network performance is varied with the ablation studies. We perform experiments on two challenging human activity datasets, UCF101 and HMDB51, and achieve the state-of-the-art results with the best network.

READ FULL TEXT
research
04/25/2019

DynamoNet: Dynamic Action and Motion Network

In this paper, we are interested in self-supervised learning the motion ...
research
02/14/2021

Learning Self-Similarity in Space and Time as Generalized Motion for Action Recognition

Spatio-temporal convolution often fails to learn motion dynamics in vide...
research
04/19/2018

Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition

Acquiring spatio-temporal states of an action is the most crucial step f...
research
11/21/2016

Deep Temporal Linear Encoding Networks

The CNN-encoding of features from entire videos for the representation o...
research
03/01/2021

Coarse-Fine Networks for Temporal Activity Detection in Videos

In this paper, we introduce 'Coarse-Fine Networks', a two-stream archite...
research
05/13/2023

Lightweight Delivery Detection on Doorbell Cameras

Despite recent advances in video-based action recognition and robust spa...
research
08/01/2016

Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification

Fine-grained classification is a relatively new field that has concentra...

Please sign up or login with your details

Forgot password? Click here to reset