Online Action Detection in Untrimmed, Streaming Videos - Modeling and Evaluation

by   Zheng Shou, et al.

The goal of Online Action Detection (OAD) is to detect action in a timely manner and to recognize its action category. Early works focused on early action detection, which is effectively formulated as a classification problem instead of online detection in streaming videos, because these works used partially seen short video clip that begins at the start of action. Recently, researchers started to tackle the OAD problem in the challenging setting of untrimmed, streaming videos that contain substantial background shots. However, they evaluate OAD in terms of per-frame labeling, which does not require detection at the instance-level and does not evaluate the timeliness of the online detection process. In this paper, we design new protocols and metrics. Further, to specifically address challenges of OAD in untrimmed, streaming videos, we propose three novel methods: (1) we design a hard negative samples generation module based on Generative Adversarial Network (GAN) framework to better distinguish ambiguous background shots that share similar scenes but lack true characteristics of action start; (2) during training we impose a temporal consistency constraint between data around action start and data succeeding action start to model their similarity; (3) we introduce an adaptive sampling strategy to handle the scarcity of the important training data around action start. We conduct extensive experiments using THUMOS'14 and ActivityNet. We show that our proposed strategies lead to significant performance gains and improve state-of-the-art results. A systematic ablation study also confirms the effectiveness of each proposed method.


Online Action Detection in Streaming Videos with Time Buffers

We formulate the problem of online temporal action detection in live str...

StartNet: Online Detection of Action Start in Untrimmed Videos

We propose StartNet to address Online Detection of Action Start (ODAS) w...

Temporal Sentence Grounding in Streaming Videos

This paper aims to tackle a novel task - Temporal Sentence Grounding in ...

Delving into 3D Action Anticipation from Streaming Videos

Action anticipation, which aims to recognize the action with a partial o...

Online Action Detection

In online action detection, the goal is to detect the start of an action...

Tutorial Recommendation for Livestream Videos using Discourse-Level Consistency and Ontology-Based Filtering

Streaming videos is one of the methods for creators to share their creat...

A Circular Window-based Cascade Transformer for Online Action Detection

Online action detection aims at the accurate action prediction of the cu...

Please sign up or login with your details

Forgot password? Click here to reset