Distillation-Resistant Watermarking for Model Protection in NLP

10/07/2022
by   Xuandong Zhao, et al.
0

How can we protect the intellectual property of trained NLP models? Modern NLP models are prone to stealing by querying and distilling from their publicly exposed APIs. However, existing protection methods such as watermarking only work for images but are not applicable to text. We propose Distillation-Resistant Watermarking (DRW), a novel technique to protect NLP models from being stolen via distillation. DRW protects a model by injecting watermarks into the victim's prediction probability corresponding to a secret key and is able to detect such a key by probing a suspect model. We prove that a protected model still retains the original accuracy within a certain bound. We evaluate DRW on a diverse set of NLP tasks including text classification, part-of-speech tagging, and named entity recognition. Experiments show that DRW protects the original model and detects stealing suspects at 100 precision for all four tasks while the prior method fails on two.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2022

Number Entity Recognition

Numbers are essential components of text, like any other word tokens, fr...
research
02/06/2023

Protecting Language Generation Models via Invisible Watermarking

Language generation models have been an increasingly powerful enabler fo...
research
08/06/2020

Training DNN Model with Secret Key for Model Protection

In this paper, we propose a model protection method by using block-wise ...
research
06/16/2023

Pushing the Limits of ChatGPT on NLP Tasks

Despite the success of ChatGPT, its performances on most NLP tasks are s...
research
03/27/2019

ner and pos when nothing is capitalized

For those languages which use it, capitalization is an important signal ...
research
02/11/2022

Constrained Optimization with Dynamic Bound-scaling for Effective NLPBackdoor Defense

We develop a novel optimization method for NLPbackdoor inversion. We lev...
research
02/03/2021

Memorization vs. Generalization: Quantifying Data Leakage in NLP Performance Evaluation

Public datasets are often used to evaluate the efficacy and generalizabi...

Please sign up or login with your details

Forgot password? Click here to reset