What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors

by   Yi-Shan Lin, et al.

EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs through model saliency explanations that highlight the parts of the inputs deemed important to arrive a decision at a specific target. However, it remains challenging to quantify correctness of their interpretability as current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. In this paper, we propose backdoor trigger patterns–hidden malicious functionalities that cause misclassification–to automate the evaluation of saliency explanations. Our key observation is that triggers provide ground truth for inputs to evaluate whether the regions identified by an XAI method are truly relevant to its output. Since backdoor triggers are the most important features that cause deliberate misclassification, a robust XAI method should reveal their presence at inference time. We introduce three complementary metrics for systematic evaluation of explanations that an XAI method generates and evaluate seven state-of-the-art model-free and model-specific posthoc methods through 36 models trojaned with specifically crafted triggers using color, shape, texture, location, and size. We discovered six methods that use local explanation and feature relevance fail to completely highlight trigger regions, and only a model-free approach can uncover the entire trigger region.


page 3

page 4

page 5

page 6

page 10


FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods

The field of explainable artificial intelligence (XAI) aims to uncover t...

On the Robustness of Interpretability Methods

We argue that robustness of explanations---i.e., that similar inputs sho...

Explainable Deep Classification Models for Domain Generalization

Conventionally, AI models are thought to trade off explainability for lo...

Precise Benchmarking of Explainable AI Attribution Methods

The rationale behind a deep learning model's output is often difficult t...

SIDU: Similarity Difference and Uniqueness Method for Explainable AI

A new brand of technical artificial intelligence ( Explainable AI ) rese...

Sanity Checks for Saliency Maps

Saliency methods have emerged as a popular tool to highlight features in...

Finding the right XAI method – A Guide for the Evaluation and Ranking of Explainable AI Methods in Climate Science

Explainable artificial intelligence (XAI) methods shed light on the pred...

Please sign up or login with your details

Forgot password? Click here to reset