Deep set conditioned latent representations for action recognition

12/21/2022
by   Akash Singh, et al.
0

In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent representation of state and visual representation for reasoning over events and actions. They learn to reason about temporally connected actions in order to identify all of them in the video. The proposed method achieves an improvement of around 1.49 atomic action recognition and 17.57 a I3D-NL baseline, on the CATER dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2017

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

This paper introduces a video dataset of spatio-temporally localized Ato...
research
07/21/2021

Evidential Deep Learning for Open Set Action Recognition

In a real-world scenario, human actions are typically out of the distrib...
research
09/11/2020

HAA500: Human-Centric Atomic Action Dataset with Curated Videos

We contribute HAA500, a manually annotated human-centric atomic action d...
research
07/25/2019

Learning Visual Actions Using Multiple Verb-Only Labels

This work introduces verb-only representations for both recognition and ...
research
07/30/2015

Action recognition in still images by latent superpixel classification

Action recognition from still images is an important task of computer vi...
research
08/03/2020

Action sequencing using visual permutations

Humans can easily reason about the sequence of high level actions needed...
research
12/02/2015

Actions Transformations

What defines an action like "kicking ball"? We argue that the true meani...

Please sign up or login with your details

Forgot password? Click here to reset