LinkNet: Relational Embedding for Scene Graph

11/15/2018
by   Sanghyun Woo, et al.
0

Objects and their relationships are critical contents for image understanding. A scene graph provides a structured description that captures these properties of an image. However, reasoning about the relationships between objects is very challenging and only a few recent works have attempted to solve the problem of generating a scene graph from an image. In this paper, we present a method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances. We design a simple and effective relational embedding module that enables our model to jointly represent connections among all related objects, rather than focus on an object in isolation. Our method significantly benefits the main part of the scene graph generation task: relationship classification. Using it on top of a basic Faster R-CNN, our model achieves state-of-the-art results on the Visual Genome benchmark. We further push the performance by introducing global context encoding module and geometrical layout encoding module. We validate our final model, LinkNet, through extensive ablation studies, demonstrating its efficacy in scene graph generation.

READ FULL TEXT

page 8

page 9

research
07/12/2021

Scenes and Surroundings: Scene Graph Generation using Relation Transformer

Identifying objects in an image and their mutual relationships as a scen...
research
06/23/2023

Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation

Scene Graph Generation (SGG) aims to structurally and comprehensively re...
research
06/16/2021

Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

In this work, we seek new insights into the underlying challenges of the...
research
11/09/2018

Image-Level Attentional Context Modeling Using Nested-Graph Neural Networks

We introduce a new scene graph generation method called image-level atte...
research
10/12/2021

Topic Scene Graph Generation by Attention Distillation from Caption

If an image tells a story, the image caption is the briefest narrator. G...
research
11/09/2020

After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation

From object segmentation to word vector representations, Scene Graph Gen...
research
01/14/2020

NODIS: Neural Ordinary Differential Scene Understanding

Semantic image understanding is a challenging topic in computer vision. ...

Please sign up or login with your details

Forgot password? Click here to reset