Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance

04/13/2023
by   Jonathan Crabbé, et al.
3

Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalize this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning. Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group. By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines to allow users and developers of interpretability methods to produce robust explanations.

READ FULL TEXT
research
06/21/2018

On the Robustness of Interpretability Methods

We argue that robustness of explanations---i.e., that similar inputs sho...
research
10/29/2019

Weight of Evidence as a Basis for Human-Oriented Explanations

Interpretability is an elusive but highly sought-after characteristic of...
research
06/19/2019

Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and int...
research
10/07/2022

In What Ways Are Deep Neural Networks Invariant and How Should We Measure This?

It is often said that a deep learning model is "invariant" to some speci...
research
01/08/2023

Equivariant and Steerable Neural Networks: A review with special emphasis on the symmetric group

Convolutional neural networks revolutionized computer vision and natrual...
research
08/16/2023

Interpretability Benchmark for Evaluating Spatial Misalignment of Prototypical Parts Explanations

Prototypical parts-based networks are becoming increasingly popular due ...
research
05/18/2022

The Solvability of Interpretability Evaluation Metrics

Feature attribution methods are popular for explaining neural network pr...

Please sign up or login with your details

Forgot password? Click here to reset