Improve the Interpretability of Attention: A Fast, Accurate, and Interpretable High-Resolution Attention Model

by   Tristan Gomez, et al.

The prevalence of employing attention mechanisms has brought along concerns on the interpretability of attention distributions. Although it provides insights about how a model is operating, utilizing attention as the explanation of model predictions is still highly dubious. The community is still seeking more interpretable strategies for better identifying local active regions that contribute the most to the final decision. To improve the interpretability of existing attention models, we propose a novel Bilinear Representative Non-Parametric Attention (BR-NPA) strategy that captures the task-relevant human-interpretable information. The target model is first distilled to have higher-resolution intermediate feature maps. From which, representative features are then grouped based on local pairwise feature similarity, to produce finer-grained, more precise attention maps highlighting task-relevant parts of the input. The obtained attention maps are ranked according to the `active level' of the compound feature, which provides information regarding the important level of the highlighted regions. The proposed model can be easily adapted in a wide variety of modern deep models, where classification is involved. It is also more accurate, faster, and with a smaller memory footprint than usual neural attention modules. Extensive experiments showcase more comprehensive visual explanations compared to the state-of-the-art visualization model across multiple tasks including few-shot classification, person re-identification, fine-grained image classification. The proposed visualization model sheds imperative light on how neural networks `pay their attention' differently in different tasks.


page 8

page 11

page 16

page 20

page 21

page 22

page 23

page 24


Interpretable Attention Guided Network for Fine-grained Visual Classification

Fine-grained visual classification (FGVC) is challenging but more critic...

Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

Fine-grained visual recognition typically depends on modeling subtle dif...

Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification

The difficulty of the fine-grained image classification mainly comes fro...

Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques

The field of deep learning is evolving in different directions, with sti...

Microscopic fine-grained instance classification through deep attention

Fine-grained classification of microscopic image data with limited sampl...

Semantically Interpretable Activation Maps: what-where-how explanations within CNNs

A main issue preventing the use of Convolutional Neural Networks (CNN) i...

On Model Explanations with Transferable Neural Pathways

Neural pathways as model explanations consist of a sparse set of neurons...

Please sign up or login with your details

Forgot password? Click here to reset