Identifying Adversarially Attackable and Robust Samples
This work proposes a novel perspective on adversarial attacks by introducing the concept of sample attackability and robustness. Adversarial attacks insert small, imperceptible perturbations to the input that cause large, undesired changes to the output of deep learning models. Despite extensive research on generating adversarial attacks and building defense systems, there has been limited research on understanding adversarial attacks from an input-data perspective. We propose a deep-learning-based method for detecting the most attackable and robust samples in an unseen dataset for an unseen target model. The proposed method is based on a neural network architecture that takes as input a sample and outputs a measure of attackability or robustness. The proposed method is evaluated using a range of different models and different attack methods, and the results demonstrate its effectiveness in detecting the samples that are most likely to be affected by adversarial attacks. Understanding sample attackability can have important implications for future work in sample-selection tasks. For example in active learning, the acquisition function can be designed to select the most attackable samples, or in adversarial training, only the most attackable samples are selected for augmentation.
READ FULL TEXT