NormGrad: Finding the Pixels that Matter for Training
The different families of saliency methods, either based on contrastive signals, closed-form formulas mixing gradients with activations or on perturbation masks, all focus on which parts of an image are responsible for the model's inference. In this paper, we are rather interested by the locations of an image that contribute to the model's training. First, we propose a principled attribution method that we extract from the summation formula used to compute the gradient of the weights for a 1x1 convolutional layer. The resulting formula is fast to compute and can used throughout the network, allowing us to efficiently produce fined-grained importance maps. We will show how to extend it in order to compute saliency maps at any targeted point within the network. Secondly, to make the attribution really specific to the training of the model, we introduce a meta-learning approach for saliency methods by considering an inner optimisation step within the loss. This way, we do not aim at identifying the parts of an image that contribute to the model's output but rather the locations that are responsible for the good training of the model on this image. Conversely, we also show that a similar meta-learning approach can be used to extract the adversarial locations which can lead to the degradation of the model.
READ FULL TEXT