Learning to Compose Dynamic Tree Structures for Visual Contexts

12/05/2018
by   Kaihua Tang, et al.
20

We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A. Our visual context tree model, dubbed VCTree, has two key advantages over existing structured object representations including chains and fully-connected graphs: 1) The efficient and expressive binary tree encodes the inherent parallel/hierarchical relationships among objects, e.g., "clothes" and "pants" are usually co-occur and belong to "person"; 2) the dynamic structure varies from image to image and task to task, allowing more content-/task-specific message passing among objects. To construct a VCTree, we design a score function that calculates the task-dependent validity between each object pair, and the tree is the binary version of the maximum spanning tree from the score matrix. Then, visual contexts are encoded by bidirectional TreeLSTM and decoded by task-specific models. We develop a hybrid learning procedure which integrates end-task supervised learning and the tree structure reinforcement learning, where the former's evaluation result serves as a self-critic for the latter's structure exploration. Experimental results on two benchmarks, which require reasoning over contexts: Visual Genome for scene graph generation and VQA2.0 for visual Q&A, show that VCTree outperforms state-of-the-art results while discovering interpretable visual context structures.

READ FULL TEXT

page 1

page 3

page 6

page 8

page 13

page 14

research
01/10/2017

Scene Graph Generation by Iterative Message Passing

Understanding a visual scene goes beyond recognizing individual objects ...
research
12/06/2018

Scene Dynamics: Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Scene graphs -- objects as nodes and visual relationships as edges -- de...
research
08/12/2020

HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation

Scene graph generation aims to produce structured representations for im...
research
09/11/2018

Context-Dependent Diffusion Network for Visual Relationship Detection

Visual relationship detection can bridge the gap between computer vision...
research
07/17/2020

Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation

Scene graph aims to faithfully reveal humans' perception of image conten...
research
01/02/2023

Task-specific Scene Structure Representations

Understanding the informative structures of scenes is essential for low-...
research
04/17/2019

Graph based Dynamic Segmentation of Generic Objects in 3D

We propose a novel 3D segmentation method for RBGD stream data to deal w...

Please sign up or login with your details

Forgot password? Click here to reset