Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes

by   Xiang Hao, et al.

To help customers make better-informed viewing choices, video-streaming services try to moderate their content and provide more visibility into which portions of their movies and TV episodes contain age-appropriate material (e.g., nudity, sex, violence, or drug-use). Supervised models to localize these sensitive activities require large amounts of clip-level labeled data which is hard to obtain, while weakly-supervised models to this end usually do not offer competitive accuracy. To address this challenge, we propose a novel Coarse2Fine network designed to make use of readily obtainable video-level weak labels in conjunction with sparse clip-level labels of age-appropriate activities. Our model aggregates frame-level predictions to make video-level classifications and is therefore able to leverage sparse clip-level labels along with video-level labels. Furthermore, by performing frame-level predictions in a hierarchical manner, our approach is able to overcome the label-imbalance problem caused due to the rare-occurrence nature of age-appropriate content. We present comparative results of our approach using 41,234 movies and TV episodes ( 3 years of video-content) from 521 sub-genres and 250 countries making it by far the largest-scale empirical analysis of age-appropriate activity localization in long-form videos ever published. Our approach offers 107.2 relative mAP improvement (from 5.5 activity-localization approaches.


page 5

page 7

page 8


W-TALC: Weakly-supervised Temporal Activity Localization and Classification

Most activity localization methods in the literature suffer from the bur...

Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos

In videos that contain actions performed unintentionally, agents do not ...

Graph Convolution Neural Network For Weakly Supervised Abnormality Localization In Long Capsule Endoscopy Videos

Temporal activity localization in long videos is an important problem. T...

Spatio-Temporal Action Localization in a Weakly Supervised Setting

Enabling computational systems with the ability to localize actions in v...

Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization

Temporally localizing activities within untrimmed videos has been extens...

Action Localization through Continual Predictive Learning

The problem of action recognition involves locating the action in the vi...

Detection and Classification of Viewer Age Range Smart Signs at TV Broadcast

In this paper, the identification and classification of Viewer Age Range...

Please sign up or login with your details

Forgot password? Click here to reset