Improving the Generalization of Adversarial Training with Domain Adaptation

by   Chuanbiao Song, et al.

By injecting adversarial examples into training data, the adversarial training method is promising for improving the robustness of deep learning models. However, most existing adversarial training approaches are based on a specific type of adversarial attack. It may not provide sufficiently representative samples from the adversarial domain, leading to a weak generalization ability on adversarial examples from other attacks. To scale to large datasets, perturbations on inputs to generate adversarial examples are usually crafted using fast single-step attacks. This work is mainly focused on the adversarial training with the single-step yet efficient FGSM adversary. In this scenario, it is difficult to train a model with great generalization due to the lack of representative adversarial samples, aka the samples are unable to accurately reflect the adversarial domain. To address this problem, we propose a novel Adversarial Training with Domain Adaptation (ATDA) method by regarding the adversarial training with FGSM adversary as a domain adaption task with limited number of target domain samples. The main idea is to learn a representation that is semantically meaningful and domain invariant on the clean domain as well as the adversarial domain. Empirical evaluations demonstrate that ATDA can greatly improve the generalization of adversarial training and achieves state-of-the-art results on standard benchmark datasets.


Class-Aware Domain Adaptation for Improving Adversarial Robustness

Recent works have demonstrated convolutional neural networks are vulnera...

Adv-4-Adv: Thwarting Changing Adversarial Perturbations via Adversarial Domain Adaptation

Whereas adversarial training can be useful against specific adversarial ...

A Closer Look at Smoothness in Domain Adversarial Training

Domain adversarial training has been ubiquitous for achieving invariant ...

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Backdoor attacks are serious security threats to machine learning models...

On the Connection between Invariant Learning and Adversarial Training for Out-of-Distribution Generalization

Despite impressive success in many tasks, deep learning models are shown...

Deepfake Forensics via An Adversarial Game

With the progress in AI-based facial forgery (i.e., deepfake), people ar...

Improving Health Mentioning Classification of Tweets using Contrastive Adversarial Training

Health mentioning classification (HMC) classifies an input text as healt...

Please sign up or login with your details

Forgot password? Click here to reset