A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes

by   Mazda Moayeri, et al.

While datasets with single-label supervision have propelled rapid advances in image classification, additional annotations are necessary in order to quantitatively assess how models make predictions. To this end, for a subset of ImageNet samples, we collect segmentation masks for the entire object and 18 informative attributes. We call this dataset RIVAL10 (RIch Visual Attributes with Localization), consisting of roughly 26k instances over 10 classes. Using RIVAL10, we evaluate the sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes. In our analysis, we consider diverse state-of-the-art architectures (ResNets, Transformers) and training procedures (CLIP, SimCLR, DeiT, Adversarial Training). We find that, somewhat surprisingly, in ResNets, adversarial training makes models more sensitive to the background compared to foreground than standard training. Similarly, contrastively-trained models also have lower relative foreground sensitivity in both transformers and ResNets. Lastly, we observe intriguing adaptive abilities of transformers to increase relative foreground sensitivity as corruption level increases. Using saliency methods, we automatically discover spurious features that drive the background sensitivity of models and assess alignment of saliency maps with foregrounds. Finally, we quantitatively study the attribution problem for neural features by comparing feature saliency with ground-truth localization of semantic attributes.


page 7

page 19

page 21

page 22

page 23

page 24

page 25

page 26


Optimizing Relevance Maps of Vision Transformers Improves Robustness

It has been observed that visual classification models often rely mostly...

Causal ImageNet: How to discover spurious features in Deep Learning?

A key reason for the lack of reliability of deep neural networks in the ...

On Saliency Maps and Adversarial Robustness

A Very recent trend has emerged to couple the notion of interpretability...

Evaluating Adversarial Robustness for Deep Neural Network Interpretability using fMRI Decoding

While deep neural networks (DNNs) are being increasingly used to make pr...

Combining Semantic Guidance and Deep Reinforcement Learning For Generating Human Level Paintings

Generation of stroke-based non-photorealistic imagery, is an important p...

Unsupervised segmentation via semantic-apparent feature fusion

Foreground segmentation is an essential task in the field of image under...

Core Risk Minimization using Salient ImageNet

Deep neural networks can be unreliable in the real world especially when...

Please sign up or login with your details

Forgot password? Click here to reset