EDDA: Explanation-driven Data Augmentation to Improve Model and Explanation Alignment

by   Ruiwen Li, et al.

Recent years have seen the introduction of a range of methods for post-hoc explainability of image classifier predictions. However, these post-hoc explanations may not always align perfectly with classifier predictions, which poses a significant challenge when attempting to debug models based on such explanations. To this end, we seek a methodology that can improve alignment between model predictions and explanation method that is both agnostic to the model and explanation classes and which does not require ground truth explanations. We achieve this through a novel explanation-driven data augmentation (EDDA) method that augments the training data with occlusions of existing data stemming from model-explanations; this is based on the simple motivating principle that occluding salient regions for the model prediction should decrease the model confidence in the prediction, while occluding non-salient regions should not change the prediction – if the model and explainer are aligned. To verify that this augmentation method improves model and explainer alignment, we evaluate the methodology on a variety of datasets, image classification models, and explanation methods. We verify in all cases that our explanation-driven data augmentation method improves alignment of the model and explanation in comparison to no data augmentation and non-explanation driven data augmentation methods. In conclusion, this approach provides a novel model- and explainer-agnostic methodology for improving alignment between model predictions and explanations, which we see as a critical step forward for practical deployment and debugging of image classification models.


page 2

page 4


Robustness of Visual Explanations to Common Data Augmentation

As the use of deep neural networks continues to grow, understanding thei...

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

Local explanation methods such as LIME have become popular in MIR as too...

Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators

Post-hoc explanation methods attempt to make the inner workings of deep ...

The Manifold Hypothesis for Gradient-Based Explanations

When do gradient-based explanation algorithms provide meaningful explana...

Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study

Existing interpretation algorithms have found that, even deep models mak...

Saliency Learning: Teaching the Model Where to Pay Attention

Deep learning has emerged as a compelling solution to many NLP tasks wit...

Examining the Difference Among Transformers and CNNs with Explanation Methods

We propose a methodology that systematically applies deep explanation al...

Please sign up or login with your details

Forgot password? Click here to reset