Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action Recognition

by   Jingxuan Hou, et al.

Despite the growing discriminative capabilities of modern deep learning methods for recognition tasks, the inner workings of the state-of-art models still remain mostly black-boxes. In this paper, we propose a systematic interpretation of model parameters and hidden representations of Residual Temporal Convolutional Networks (Res-TCN) for action recognition in time-series data. We also propose a Feature Map Decoder as part of the interpretation analysis, which outputs a representation of model's hidden variables in the same domain as the input. Such analysis empowers us to expose model's characteristic learning patterns in an interpretable way. For example, through the diagnosis analysis, we discovered that our model has learned to achieve view-point invariance by implicitly learning to perform rotational normalization of the input to a more discriminative view. Based on the findings from the model interpretation analysis, we propose a targeted refinement technique, which can generalize to various other recognition models. The proposed work introduces a three-stage paradigm for model learning: training, interpretable diagnosis and targeted refinement. We validate our approach on skeleton based 3D human action recognition benchmark of NTU RGB+D. We show that the proposed workflow is an effective model learning strategy and the resulting Multi-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the state-of-the-art performance on NTU RGB+D.


page 1

page 2

page 3

page 4


Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition

Graph convolutional networks have been widely used in skeleton-based act...

Exploiting deep residual networks for human action recognition from skeletal data

The computer vision community is currently focusing on solving action re...

Joint-bone Fusion Graph Convolutional Network for Semi-supervised Skeleton Action Recognition

In recent years, graph convolutional networks (GCNs) play an increasingl...

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

Skeleton-based action recognition has made great progress recently, but ...

Learning Discriminative Representations for Skeleton Based Action Recognition

Human action recognition aims at classifying the category of human actio...

Deep-Aligned Convolutional Neural Network for Skeleton-based Action Recognition and Segmentation

Convolutional neural networks (CNNs) are deep learning frameworks which ...

The role of ego vision in view-invariant action recognition

Analysis and interpretation of egocentric video data is becoming more an...

Please sign up or login with your details

Forgot password? Click here to reset