Visual-Semantic Graph Attention Network for Human-Object Interaction Detection

by   Zhijun Liang, et al.

In scene understanding, machines benefit from not only detecting individual scene instances but also from learning their possible interactions. Human-Object Interaction (HOI) Detection tries to infer the predicate on a <subject,predicate,object> triplet. Contextual information has been found critical in inferring interactions. However, most works use features from single object instances that have a direct relation with the subject. Few works have studied the disambiguating contribution of subsidiary relations in addition to how attention might leverage them for inference. We contribute a dual-graph attention network that aggregates contextual visual, spatial, and semantic information dynamically for primary subject-object relations as well as subsidiary relations. Graph attention networks dynamically leverage node neighborhood information. Our network uses attention to first leverage visual-spatial and semantic cues from primary and subsidiary relations independently and then combines them before a final readout step. Our network learns to use primary and subsidiary relations to improve inference: encouraging the right interpretations and discouraging incorrect ones. We call our model: Visual-Semantic Graph Attention Networks (VS-GATs). We surpass state-of-the-art HOI detection mAPs in the challenging HICO-DET dataset, including in long-tail cases that are harder to interpret. Code, video, and supplementary information is available at


page 2

page 3

page 4

page 5

page 6

page 7

page 8

page 9


DRG: Dual Relation Graph for Human-Object Interaction Detection

We tackle the challenging problem of human-object interaction (HOI) dete...

Pose-aware Multi-level Feature Network for Human Object Interaction Detection

Reasoning human object interactions is a core problem in human-centric s...

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Comprehensive visual understanding requires detection frameworks that ca...

Devil's on the Edges: Selective Quad Attention for Scene Graph Generation

Scene graph generation aims to construct a semantic graph structure from...

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

We consider the problem of Human-Object Interaction (HOI) Detection, whi...

Reasoning About Human-Object Interactions Through Dual Attention Networks

Objects are entities we act upon, where the functionality of an object i...

iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection

Recent years have witnessed rapid progress in detecting and recognizing ...

Please sign up or login with your details

Forgot password? Click here to reset