Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

by   Fanchao Qi, et al.

Adversarial attacks and backdoor attacks are two common security threats that hang over deep learning. Both of them harness task-irrelevant features of data in their implementation. Text style is a feature that is naturally irrelevant to most NLP tasks, and thus suitable for adversarial and backdoor attacks. In this paper, we make the first attempt to conduct adversarial and backdoor attacks based on text style transfer, which is aimed at altering the style of a sentence while preserving its meaning. We design an adversarial attack method and a backdoor attack method, and conduct extensive experiments to evaluate them. Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer – the attack success rates can exceed 90 ability of NLP models to handle the feature of text style that has not been widely realized. In addition, the style transfer-based adversarial and backdoor attack methods show superiority to baselines in many aspects. All the code and data of this paper can be obtained at https://github.com/thunlp/StyleAttack.


page 1

page 2

page 3

page 4


Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

Text style transfer refers to the task of rephrasing a given text in a d...

Contextualizing Variation in Text Style Transfer Datasets

Text style transfer involves rewriting the content of a source sentence ...

Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer

Deep neural networks are vulnerable to adversarial examples crafted by a...

NAST: A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer

Autoregressive models have been widely used in unsupervised text style t...

A White-Box False Positive Adversarial Attack Method on Contrastive Loss-Based Offline Handwritten Signature Verification Models

In this paper, we tackle the challenge of white-box false positive adver...

Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP

Textual adversarial samples play important roles in multiple subfields o...

BadPrompt: Backdoor Attacks on Continuous Prompts

The prompt-based learning paradigm has gained much research attention re...

Please sign up or login with your details

Forgot password? Click here to reset