Hierarchical Attention Network for Action Segmentation

05/07/2020
by   Harshala Gammulle, et al.
0

The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.

READ FULL TEXT
research
09/20/2019

Fine-grained Action Segmentation using the Semi-Supervised Action GAN

In this paper we address the problem of continuous fine-grained action s...
research
03/31/2020

SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation

Temporal action segmentation is a topic of increasing interest, however,...
research
11/16/2016

Temporal Convolutional Networks for Action Segmentation and Detection

The ability to identify and temporally segment fine-grained human action...
research
03/04/2017

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

Temporal action localization is an important yet challenging problem. Gi...
research
07/20/2022

ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network

In this paper a pure-attention bottom-up approach, called ViGAT, that ut...
research
07/20/2022

Spotting Temporally Precise, Fine-Grained Events in Video

We introduce the task of spotting temporally precise, fine-grained event...
research
09/13/2018

Video to Fully Automatic 3D Hair Model

Imagine taking a selfie video with your mobile phone and getting as outp...

Please sign up or login with your details

Forgot password? Click here to reset