Constraint and Union for Partially-Supervised Temporal Sentence Grounding

02/20/2023
by   Chen Ju, et al.
0

Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos. The existing fully-supervised setting achieves great performance but requires expensive annotation costs; while the weakly-supervised setting adopts cheap labels but performs poorly. To pursue high performance with less annotation cost, this paper introduces an intermediate partially-supervised setting, i.e., only short-clip or even single-frame labels are available during training. To take full advantage of partial labels, we propose a novel quadruple constraint pipeline to comprehensively shape event-query aligned representations, covering intra- and inter-samples, uni- and multi-modalities. The former raises intra-cluster compactness and inter-cluster separability; while the latter enables event-background separation and event-query gather. To achieve more powerful performance with explicit grounding optimization, we further introduce a partial-full union framework, i.e., bridging with an additional fully-supervised branch, to enjoy its impressive grounding bonus, and be robust to partial annotations. Extensive experiments and ablations on Charades-STA and ActivityNet Captions demonstrate the significance of partial supervision and our superior performance.

READ FULL TEXT

page 8

page 13

research
08/08/2023

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

Temporal sentence grounding (TSG) aims to locate a specific moment from ...
research
06/30/2021

Weakly Supervised Temporal Adjacent Network for Language Grounding

Temporal language grounding (TLG) is a fundamental and challenging probl...
research
10/24/2022

Language-free Training for Zero-shot Video Grounding

Given an untrimmed video and a language query depicting a specific tempo...
research
09/18/2020

Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos

Temporal grounding of natural language in untrimmed videos is a fundamen...
research
01/14/2022

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

Temporal video grounding (TVG) aims to localize a target segment in a vi...
research
03/12/2023

Towards Diverse Temporal Grounding under Single Positive Labels

Temporal grounding aims to retrieve moments of the described event withi...
research
03/31/2021

Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding

Temporal grounding aims to localize temporal boundaries within untrimmed...

Please sign up or login with your details

Forgot password? Click here to reset