Modeling long-term interactions to enhance action recognition

04/23/2021
by   Alejandro Cartas, et al.
0

In this paper, we propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels. At the frame level, we use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects and calculates the action score through a CNN formulation. This information is then fed to a Hierarchical LongShort-Term Memory Network (HLSTM) that captures temporal dependencies between actions within and across shots. Ablation studies thoroughly validate the proposed approach, showing in particular that both levels of the HLSTM architecture contribute to performance improvement. Furthermore, quantitative comparisons show that the proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks,without relying on motion information

READ FULL TEXT

page 1

page 4

page 5

research
05/08/2018

Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network

Spatio-temporal contexts are crucial in understanding human actions in v...
research
10/17/2019

Making Third Person Techniques Recognize First-Person Actions in Egocentric Videos

We focus on first-person action recognition from egocentric videos. Unli...
research
06/02/2018

Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition

Spatial and temporal features are two key and complementary information ...
research
09/23/2022

Leveraging Self-Supervised Training for Unintentional Action Recognition

Unintentional actions are rare occurrences that are difficult to define ...
research
10/22/2020

Learning to Sort Image Sequences via Accumulated Temporal Differences

Consider a set of n images of a scene with dynamic objects captured with...
research
05/10/2018

Towards an Unequivocal Representation of Actions

This work introduces verb-only representations for actions and interacti...
research
02/04/2016

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

We propose a hierarchical approach to multi-action recognition that perf...

Please sign up or login with your details

Forgot password? Click here to reset