Faster Video Moment Retrieval with Point-Level Supervision

05/23/2023
by   Xun Jiang, et al.
0

Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an untrimmed video with natural language queries. Existing VMR methods suffer from two defects: (1) massive expensive temporal annotations are required to obtain satisfying performance; (2) complicated cross-modal interaction modules are deployed, which lead to high computational cost and low efficiency for the retrieval process. To address these issues, we propose a novel method termed Cheaper and Faster Moment Retrieval (CFMR), which well balances the retrieval accuracy, efficiency, and annotation cost for VMR. Specifically, our proposed CFMR method learns from point-level supervision where each annotation is a single frame randomly located within the target moment. It is 6 times cheaper than the conventional annotations of event boundaries. Furthermore, we also design a concept-based multimodal alignment mechanism to bypass the usage of cross-modal interaction modules during the inference process, remarkably improving retrieval efficiency. The experimental results on three widely used VMR benchmarks demonstrate the proposed CFMR method establishes new state-of-the-art with point-level supervision. Moreover, it significantly accelerates the retrieval speed with more than 100 times FLOPs compared to existing approaches with point-level supervision.

READ FULL TEXT

page 1

page 3

page 7

page 9

research
04/20/2022

Video Moment Retrieval from Text Queries via Single Frame Annotation

Video moment retrieval aims at finding the start and end timestamps of a...
research
04/20/2021

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

Text-video retrieval is a challenging task that aims to search relevant ...
research
09/23/2022

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

As an increasingly popular task in multimedia information retrieval, vid...
research
01/24/2020

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

We introduce a new multimodal retrieval task - TV show Retrieval (TVR), ...
research
06/18/2020

Language Guided Networks for Cross-modal Moment Retrieval

We address the challenging task of cross-modal moment retrieval, which a...
research
07/02/2023

Referring Video Object Segmentation with Inter-Frame Interaction and Cross-Modal Correlation

Referring video object segmentation (RVOS) aims to segment the target ob...
research
09/04/2020

Video Moment Retrieval via Natural Language Queries

In this paper, we propose a novel method for video moment retrieval (VMR...

Please sign up or login with your details

Forgot password? Click here to reset