Devil's on the Edges: Selective Quad Attention for Scene Graph Generation

04/07/2023
by   Deunsol Jung, et al.
0

Scene graph generation aims to construct a semantic graph structure from an image such that its nodes and edges respectively represent objects and their relationships. One of the major challenges for the task lies in the presence of distracting objects and relationships in images; contextual reasoning is strongly distracted by irrelevant objects or backgrounds and, more importantly, a vast number of irrelevant candidate relations. To tackle the issue, we propose the Selective Quad Attention Network (SQUAT) that learns to select relevant object pairs and disambiguate them via diverse contextual interactions. SQUAT consists of two main components: edge selection and quad attention. The edge selection module selects relevant object pairs, i.e., edges in the scene graph, which helps contextual reasoning, and the quad attention module then updates the edge features using both edge-to-node and edge-to-edge cross-attentions to capture contextual information between objects and object pairs. Experiments demonstrate the strong performance and robustness of SQUAT, achieving the state of the art on the Visual Genome and Open Images v6 benchmarks.

READ FULL TEXT

page 1

page 3

page 8

page 12

page 13

page 14

research
09/11/2021

BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation

Scene graphs are nodes and edges consisting of objects and object-object...
research
07/12/2021

Scenes and Surroundings: Scene Graph Generation using Relation Transformer

Identifying objects in an image and their mutual relationships as a scen...
research
03/29/2018

Iterative Visual Reasoning Beyond Convolutions

We present a novel framework for iterative visual reasoning. Our framewo...
research
11/26/2018

Attentive Relational Networks for Mapping Images to Scene Graphs

Scene graph generation refers to the task of automatically mapping an im...
research
01/07/2020

Visual-Semantic Graph Attention Network for Human-Object Interaction Detection

In scene understanding, machines benefit from not only detecting individ...
research
03/29/2020

GPS-Net: Graph Property Sensing Network for Scene Graph Generation

Scene graph generation (SGG) aims to detect objects in an image along wi...
research
10/17/2020

Self-Selective Context for Interaction Recognition

Human-object interaction recognition aims for identifying the relationsh...

Please sign up or login with your details

Forgot password? Click here to reset