Spatio-Temporal Action Detection with Multi-Object Interaction

04/01/2020
by   Huijuan Xu, et al.
13

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube". Nowadays, most spatio-temporal action detection datasets (e.g. UCF101-24, AVA, DALY) are annotated with action tubes that contain a single person performing the action, thus the predominant action detection models simply employ a person detection and tracking pipeline for localization. However, when the action is defined as an interaction between multiple objects, such methods may fail since each bounding box in the action tube contains multiple objects instead of one person. In this paper, we study the spatio-temporal action detection problem with multi-object interaction. We introduce a new dataset that is annotated with action tubes containing multi-object interactions. Moreover, we propose an end-to-end spatio-temporal action detection model that performs both spatial and temporal regression simultaneously. Our spatial regression may enclose multiple objects participating in the action. During test time, we simply connect the regressed bounding boxes within the predicted temporal duration using a simple heuristic. We report the baseline results of our proposed model on this new dataset, and also show competitive results on the standard benchmark UCF101-24 using only RGB input.

READ FULL TEXT

page 2

page 3

page 8

page 14

research
04/21/2022

A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions

Spatio-temporal action detection is an important and challenging problem...
research
03/01/2019

Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

Previous spatial-temporal action localization methods commonly follow th...
research
10/30/2021

A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies

As one of the challenging problems in video search, Person-Action Instan...
research
11/19/2020

Towards Spatio-Temporal Video Scene Text Detection via Temporal Clustering

With only bounding-box annotations in the spatial domain, existing video...
research
04/24/2023

End-to-End Spatio-Temporal Action Localisation with Video Transformers

The most performant spatio-temporal action localisation models use exter...
research
03/15/2016

First Person Action-Object Detection with EgoNet

Unlike traditional third-person cameras mounted on robots, a first-perso...
research
06/15/2021

Relation Modeling in Spatio-Temporal Action Localization

This paper presents our solution to the AVA-Kinetics Crossover Challenge...

Please sign up or login with your details

Forgot password? Click here to reset