SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data

by   Ziyan Yang, et al.

We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challenging OIv6-SCoRD benchmark such that the training and testing splits have a distribution shift in terms of the occurrence statistics of ⟨subject, relation, object⟩ triplets. To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens. First, we show that previous scene-graph prediction methods fail to produce as exhaustive an enumeration of relation-object pairs when conditioned on a subject on this benchmark. Particularly, we obtain a recall@3 of 83.8 our relation-object predictions compared to the 49.75 scene graph detector. Then, we show improved generalization on both relation-object and object-box predictions by leveraging during training relation-object pairs obtained automatically from textual captions and for which no object-box annotations are available. Particularly, for ⟨subject, relation, object⟩ triplets for which no object locations are available during training, we are able to obtain a recall@3 of 42.59


page 1

page 3

page 8

page 13

page 14


Inferring spatial relations from textual descriptions of images

Generating an image from its textual description requires both a certain...

Weakly-supervised learning of visual relations

This paper introduces a novel approach for modeling visual relations bet...

Improving Relation Extraction by Leveraging Knowledge Graph Link Prediction

Relation extraction (RE) aims to predict a relation between a subject an...

Detecting Objects with Graph Priors and Graph Refinement

The goal of this paper is to detect objects by exploiting their interrel...

Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction

Extracting graph representation of visual scenes in image is a challengi...

COSMO: Contextualized Scene Modeling with Boltzmann Machines

Scene modeling is very crucial for robots that need to perceive, reason ...

Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation

The scene graph generation (SGG) task is designed to identify the predic...

Please sign up or login with your details

Forgot password? Click here to reset