When Explanations Lie: Why Modified BP Attribution Fails
Modified backpropagation methods are a popular group of attribution methods. We analyse the most prominent methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation, Excitation BP, PatternAttribution, Deconv, and Guided BP. We found empirically that the explanations of the mentioned modified BP methods are independent of the parameters of later layers and show that the z^+ rule used by multiple methods converges to a rank-1 matrix. This can explain well why the actual network's decision is ignored. We also develop a new metric cosine similarity convergence (CSC) to directly quantify the convergence of the modified BP methods to a rank-1 matrix. Our conclusion is that many modified BP methods do not explain the predictions of deep neural networks faithfully.
READ FULL TEXT