Video Imprint

06/07/2021
by   Zhanning Gao, et al.
0

A new unified video analytics framework (ER3) is proposed for complex event retrieval, recognition and recounting, based on the proposed video imprint representation, which exploits temporal correlations among image features across video frames. With the video imprint representation, it is convenient to reverse map back to both temporal and spatial locations in video frames, allowing for both key frame identification and key areas localization within each frame. In the proposed framework, a dedicated feature alignment module is incorporated for redundancy removal across frames to produce the tensor representation, i.e., the video imprint. Subsequently, the video imprint is individually fed into both a reasoning network and a feature aggregation module, for event recognition/recounting and event retrieval tasks, respectively. Thanks to its attention mechanism inspired by the memory networks used in language modeling, the proposed reasoning network is capable of simultaneous event category recognition and localization of the key pieces of evidence for event recounting. In addition, the latent structure in our reasoning network highlights the areas of the video imprint, which can be directly used for event recounting. With the event retrieval task, the compact video representation aggregated from the video imprint contributes to better retrieval results than existing state-of-the-art methods.

READ FULL TEXT

page 2

page 4

page 5

page 7

page 8

page 9

page 10

page 13

research
04/05/2016

Counting Grid Aggregation for Event Retrieval and Recognition

Event retrieval and recognition in a large corpus of videos necessitates...
research
03/14/2022

Attention based Memory video portrait matting

We proposed a novel trimap free video matting method based on the attent...
research
09/19/2022

E-VFIA : Event-Based Video Frame Interpolation with Attention

Video frame interpolation (VFI) is a fundamental vision task that aims t...
research
10/11/2020

Self-attention aggregation network for video face representation and recognition

Models based on self-attention mechanisms have been successful in analyz...
research
05/27/2021

SSAN: Separable Self-Attention Network for Video Representation Learning

Self-attention has been successfully applied to video representation lea...
research
05/21/2021

Puck localization and multi-task event recognition in broadcast hockey videos

Puck localization is an important problem in ice hockey video analytics ...
research
02/19/2020

Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos

Image-based sports analytics enable automatic retrieval of key events in...

Please sign up or login with your details

Forgot password? Click here to reset