Visual Relation Grounding in Videos

07/17/2020
by   Junbin Xiao, et al.
0

In this paper, we explore a novel task named visual Relation Grounding in Videos (vRGV). The task aims at spatio-temporally localizing the given relations in the form of subject-predicate-object in the videos, so as to provide supportive visual facts for other high-level video-language tasks (e.g., video-language grounding and video question answering). The challenges in this task include but not limited to: (1) both the subject and object are required to be spatio-temporally localized to ground a query relation; (2) the temporal dynamic nature of visual relations in videos is difficult to capture; and (3) the grounding should be achieved without any direct supervision in space and time. To ground the relations, we tackle the challenges by collaboratively optimizing two sequences of regions over a constructed hierarchical spatio-temporal region graph through relation attending and reconstruction, in which we further propose a message passing mechanism by spatial attention shifting between visual entities. Experimental results demonstrate that our model can not only outperform baseline approaches significantly, but also produces visually meaningful facts to support visual grounding. (Code is available at https://github.com/doc-doc/vRGV).

READ FULL TEXT

page 12

page 21

research
08/16/2020

Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding

Spatio-temporal video grounding aims to retrieve the spatio-temporal tub...
research
01/19/2020

Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

In this paper, we consider a novel task, Spatio-Temporal Video Grounding...
research
07/02/2021

Visual Relationship Forecasting in Videos

Real-world scenarios often require the anticipation of object interactio...
research
02/22/2023

Connecting Vision and Language with Video Localized Narratives

We propose Video Localized Narratives, a new form of multimodal video an...
research
03/25/2019

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

Visual relationship reasoning is a crucial yet challenging task for unde...
research
12/08/2021

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs

Today's VidSGG models are all proposal-based methods, i.e., they first g...
research
12/03/2020

Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D

Understanding spatial relations (e.g., "laptop on table") in visual inpu...

Please sign up or login with your details

Forgot password? Click here to reset