GoG: Relation-aware Graph-over-Graph Network for Visual Dialog

by   Feilong Chen, et al.

Visual dialog, which aims to hold a meaningful conversation with humans about a given image, is a challenging task that requires models to reason the complex dependencies among visual content, dialog history, and current questions. Graph neural networks are recently applied to model the implicit relations between objects in an image or dialog. However, they neglect the importance of 1) coreference relations among dialog history and dependency relations between words for the question representation; and 2) the representation of the image based on the fully represented question. Therefore, we propose a novel relation-aware graph-over-graph network (GoG) for visual dialog. Specifically, GoG consists of three sequential graphs: 1) H-Graph, which aims to capture coreference relations among dialog history; 2) History-aware Q-Graph, which aims to fully understand the question through capturing dependency relations between words based on coreference resolution on the dialog history; and 3) Question-aware I-Graph, which aims to capture the relations between objects in an image based on fully question representation. As an additional feature representation module, we add GoG to the existing visual dialogue model. Experimental results show that our model outperforms the strong baseline in both generative and discriminative settings by a significant margin.


page 1

page 9

page 14


DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Visual Dialog is a vision-language task that requires an AI agent to eng...

Reasoning Visual Dialogs with Structural and Partial Observations

We propose a novel model to address the task of Visual Dialog which exhi...

Interactive Text Graph Mining with a Prolog-based Dialog Engine

On top of a neural network-based dependency parser and a graph-based nat...

Modality-Balanced Models for Visual Dialogue

The Visual Dialog task requires a model to exploit both image and conver...

Relations World: A Possibilistic Graphical Model

We explore the idea of using a "possibilistic graphical model" as the ba...

Iterative Context-Aware Graph Inference for Visual Dialog

Visual dialog is a challenging task that requires the comprehension of t...

Video Dialog as Conversation about Objects Living in Space-Time

It would be a technological feat to be able to create a system that can ...

Please sign up or login with your details

Forgot password? Click here to reset