Distillation of Human-Object Interaction Contexts for Action Recognition

12/17/2021
by   Muna Almushyti, et al.
0

Modeling spatial-temporal relations is imperative for recognizing human actions, especially when a human is interacting with objects, while multiple objects appear around the human differently over time. Most existing action recognition models focus on learning overall visual cues of a scene but disregard informative fine-grained features, which can be captured by learning human-object relationships and interactions. In this paper, we learn human-object relationships by exploiting the interaction of their local and global contexts. We hence propose the Global-Local Interaction Distillation Network (GLIDN), learning human and object interactions through space and time via knowledge distillation for fine-grained scene understanding. GLIDN encodes humans and objects into graph nodes and learns local and global relations via graph attention network. The local context graphs learn the relation between humans and objects at a frame level by capturing their co-occurrence at a specific time step. The global relation graph is constructed based on the video-level of human and object interactions, identifying their long-term relations throughout a video sequence. More importantly, we investigate how knowledge from these graphs can be distilled to their counterparts for improving human-object interaction (HOI) recognition. We evaluate our model by conducting comprehensive experiments on two datasets including Charades and CAD-120 datasets. We have achieved better results than the baselines and counterpart approaches.

READ FULL TEXT

page 1

page 4

page 5

page 6

research
11/16/2017

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Human actions often involve complex interactions across several inter-re...
research
11/17/2022

Sub-Graph Learning for Spatiotemporal Forecasting via Knowledge Distillation

One of the challenges in studying the interactions in large graphs is to...
research
08/10/2023

Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation

The task of dynamic scene graph generation (DynSGG) aims to generate sce...
research
01/10/2019

Multi-Granularity Reasoning for Social Relation Recognition from Images

Discovering social relations in images can make machines better interpre...
research
10/17/2020

Self-Selective Context for Interaction Recognition

Human-object interaction recognition aims for identifying the relationsh...
research
11/12/2020

Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent

Computer vision algorithms performance are near or superior to humans in...
research
06/05/2018

Videos as Space-Time Region Graphs

How do humans recognize the action "opening a book" ? We argue that ther...

Please sign up or login with your details

Forgot password? Click here to reset