When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs

by   Oana Ignat, et al.
University of Michigan

We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization.


page 1

page 2

page 5

page 6

page 14


OWL (Observe, Watch, Listen): Localizing Actions in Egocentric Video via Audiovisual Temporal Context

Temporal action localization (TAL) is an important task extensively expl...

FineAction: A Fined Video Dataset for Temporal Action Localization

On the existing benchmark datasets, THUMOS14 and ActivityNet, temporal a...

Learning to Localize Actions from Moments

With the knowledge of action moments (i.e., trimmed video clips that eac...

A Novel Online Action Detection Framework from Untrimmed Video Streams

Online temporal action localization from an untrimmed video stream is a ...

Scale Matters: Temporal Scale Aggregation Network for Precise Action Localization in Untrimmed Videos

Temporal action localization is a recently-emerging task, aiming to loca...

Temporal Action Localization using Long Short-Term Dependency

Temporal action localization in untrimmed videos is an important but dif...

TALL: Temporal Activity Localization via Language Query

This paper focuses on temporal localization of actions in untrimmed vide...

Please sign up or login with your details

Forgot password? Click here to reset