Classifying Collisions with Spatio-Temporal Action Graph Networks

by   Roei Herzig, et al.

Events defined by the interaction of objects in a scene often are of critical importance, yet such events are typically rare and available labeled examples insufficient to train a conventional deep model that performs well across expected object appearances. Most deep learning activity recognition models focus on global context aggregation and do not explicitly consider object interactions inside the video, potentially overlooking important cues relevant to interpreting activity in the scene. In this paper, we show that a new model for explicit representation of object interactions significantly improves deep video activity classification for driving collision detection. We propose a Spatio-Temporal Action Graph (STAG) network, which incorporates spatial and temporal relations of objects. The network is automatically learned from data, with a latent graph structure inferred for the task. As a benchmark to evaluate performance on collision detection tasks, we introduce a novel data set based on data obtained from real life driving collisions and near-collisions. This data set reflects the challenging task of detecting and classifying accidents in a richly varying but yet highly constrained setting, that is very relevant to the evaluation of autonomous driving and alerting systems. Our experiments confirm that our STAG model offers significantly improved results for collision activity classification.


page 1

page 3

page 4

page 6

page 8


Unified Graph Structured Models for Video Understanding

Accurate video understanding involves reasoning about the relationships ...

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

Spatio-temporal Human-Object Interaction (ST-HOI) detection aims at dete...

Automatic Interaction and Activity Recognition from Videos of Human Manual Demonstrations with Application to Anomaly Detection

This paper presents a new method to describe spatio-temporal relations b...

Object Level Visual Reasoning in Videos

Human activity recognition is typically addressed by training models to ...

Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Moving objects have special importance for Autonomous Driving tasks. Det...

Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

Group activity recognition aims to understand the activity performed by ...

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

Human-object interaction(HOI) detection is an important task for underst...

Please sign up or login with your details

Forgot password? Click here to reset