Explaining Motion Relevance for Activity Recognition in Video Deep Learning Models

by   Liam Hiley, et al.

A small subset of explainability techniques developed initially for image recognition models has recently been applied for interpretability of 3D Convolutional Neural Network models in activity recognition tasks. Much like the models themselves, the techniques require little or no modification to be compatible with 3D inputs. However, these explanation techniques regard spatial and temporal information jointly. Therefore, using such explanation techniques, a user cannot explicitly distinguish the role of motion in a 3D model's decision. In fact, it has been shown that these models do not appropriately factor motion information into their decision. We propose a selective relevance method for adapting the 2D explanation techniques to provide motion-specific explanations, better aligning them with the human understanding of motion as conceptually separate from static spatial features. We demonstrate the utility of our method in conjunction with several widely-used 2D explanation methods, and show that it improves explanation selectivity for motion. Our results show that the selective relevance method can not only provide insight on the role played by motion in the model's decision – in effect, revealing and quantifying the model's spatial bias – but the method also simplifies the resulting explanations for human consumption.


Discriminating Spatial and Temporal Relevance in Deep Taylor Decompositions for Explainable Activity Recognition

Current techniques for explainable AI have been applied with some succes...

ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition

It is difficult for people to interpret the decision-making in the infer...

Aggregating explainability methods for neural networks stabilizes explanations

Despite a growing literature on explaining neural networks, no consensus...

Spatial-temporal Concept based Explanation of 3D ConvNets

Recent studies have achieved outstanding success in explaining 2D image ...

Combined Static and Motion Features for Deep-Networks Based Activity Recognition in Videos

Activity recognition in videos in a deep-learning setting---or otherwise...

Adding Why to What? Analyses of an Everyday Explanation

In XAI it is important to consider that, in contrast to explanations for...

Efficient data-driven encoding of scene motion using Eccentricity

This paper presents a novel approach of representing dynamic visual scen...

Please sign up or login with your details

Forgot password? Click here to reset