Modelling Adversarial Noise for Adversarial Defense
Deep neural networks have been demonstrated to be vulnerable to adversarial noise, promoting the development of defenses against adversarial attacks. Traditionally, adversarial defenses typically focus on directly exploiting adversarial examples to remove adversarial noise or train an adversarially robust target model. Motivated by that the relationship between adversarial data and natural data can help infer clean data from adversarial data to obtain the final correct prediction, in this paper, we study to model adversarial noise to learn the transition relationship in the label space for using adversarial labels to improve adversarial accuracy. Specifically, we introduce a transition matrix to relate adversarial labels and true labels. By exploiting the transition matrix, we can directly infer clean labels from adversarial labels. Then, we propose to employ a deep neural network (i.e., transition network) to model the instance-dependent transition matrix from adversarial noise. In addition, we conduct joint adversarial training on the target model and the transition network to achieve optimal performance. Empirical evaluations on benchmark datasets demonstrate that our method could significantly improve adversarial accuracy in comparison to state-of-the-art methods.
READ FULL TEXT