Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

03/19/2022
by   Jiayi Wang, et al.
0

Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust. However, we find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We question the validity of current evaluation of robustness of PrLMs based on these non-natural adversarial samples and propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples. We also investigate two applications of the anomaly detector: (1) In data augmentation, we employ the anomaly detector to force generating augmented data that are distinguished as non-natural, which brings larger gains to the accuracy of PrLMs. (2) We apply the anomaly detector to a defense framework to enhance the robustness of PrLMs. It can be used to defend all types of attacks and achieves higher accuracy on both adversarial samples and compliant samples than other defense frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2021

Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice

Pre-trained contextualized language models (PrLMs) have led to strong pe...
research
06/11/2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Recently adversarial attacks on automatic speaker verification (ASV) sys...
research
02/28/2019

Enhancing the Robustness of Deep Neural Networks by Boundary Conditional GAN

Deep neural networks have been widely deployed in various machine learni...
research
03/17/2023

Robust Mode Connectivity-Oriented Adversarial Defense: Enhancing Neural Network Robustness Against Diversified ℓ_p Attacks

Adversarial robustness is a key concept in measuring the ability of neur...
research
02/07/2020

Can't Boil This Frog: Robustness of Online-Trained Autoencoder-Based Anomaly Detectors to Adversarial Poisoning Attacks

In recent years, a variety of effective neural network-based methods for...
research
12/13/2020

DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

Public resources and services (e.g., datasets, training platforms, pre-t...
research
05/29/2023

From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework

Textual adversarial attacks can discover models' weaknesses by adding se...

Please sign up or login with your details

Forgot password? Click here to reset