A flexible model for training action localization with varying levels of supervision

06/29/2018
by   Guilhem Chéron, et al.
2

Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demanding weak supervision. Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization. We investigate applications of such a model to training setups with alternative supervisory signals ranging from video-level class labels over temporal points or sparse action bounding boxes to the full per-frame annotation of action bounding boxes. Experiments on the challenging UCF101-24 and DALY datasets demonstrate competitive performance of our method at a fraction of supervision used by previous methods. The flexibility of our model enables joint learning from data with different levels of annotation. Experimental results demonstrate a significant gain by adding a few fully supervised examples to otherwise weakly labeled videos.

READ FULL TEXT

page 2

page 8

page 11

page 12

page 13

page 14

page 15

research
05/29/2018

Pointly-Supervised Action Localization

This paper strives for spatio-temporal localization of human actions in ...
research
06/05/2020

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

Online action detection in untrimmed videos aims to identify an action a...
research
04/24/2023

End-to-End Spatio-Temporal Action Localisation with Video Transformers

The most performant spatio-temporal action localisation models use exter...
research
01/24/2019

General Supervision via Probabilistic Transformations

Different types of training data have led to numerous schemes for superv...
research
02/19/2023

Accelerated Video Annotation driven by Deep Detector and Tracker

Annotating object ground truth in videos is vital for several downstream...
research
07/20/2022

A Generalized Robust Framework For Timestamp Supervision in Temporal Action Segmentation

In temporal action segmentation, Timestamp supervision requires only a h...
research
05/05/2021

Towards Self-Supervision for Video Identification of Individual Holstein-Friesian Cattle: The Cows2021 Dataset

In this paper we publish the largest identity-annotated Holstein-Friesia...

Please sign up or login with your details

Forgot password? Click here to reset