Where is the Model Looking At?–Concentrate and Explain the Network Attention

by   Wenjia Xu, et al.
University of the Chinese Academy of Sciences

Image classification models have achieved satisfactory performance on many datasets, sometimes even better than human. However, The model attention is unclear since the lack of interpretability. This paper investigates the fidelity and interpretability of model attention. We propose an Explainable Attribute-based Multi-task (EAT) framework to concentrate the model attention on the discriminative image area and make the attention interpretable. We introduce attributes prediction to the multi-task learning network, helping the network to concentrate attention on the foreground objects. We generate attribute-based textual explanations for the network and ground the attributes on the image to show visual explanations. The multi-model explanation can not only improve user trust but also help to find the weakness of network and dataset. Our framework can be generalized to any basic model. We perform experiments on three datasets and five basic models. Results indicate that the EAT framework can give multi-modal explanations that interpret the network decision. The performance of several recognition approaches is improved by guiding network attention.


page 1

page 2

page 6

page 8

page 9

page 10


Multi-task CNN Model for Attribute Prediction

This paper proposes a joint multi-task learning algorithm to better pred...

eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic Segmentation

Recently vision transformer models have become prominent models for a ra...

REX: Reasoning-aware and Grounded Explanation

Effectiveness and interpretability are two essential properties for trus...

A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations

Explainable deep learning models are advantageous in many situations. Pr...

Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

Existing explanation models generate only text for recommendations but s...

Explain and Predict, and then Predict again

A desirable property of learning systems is to be both effective and int...

Model Explanations under Calibration

Explaining and interpreting the decisions of recommender systems are bec...

Please sign up or login with your details

Forgot password? Click here to reset