Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking

03/20/2023
by   Ruixiang Tang, et al.
0

The huge supporting training data on the Internet has been a key factor in the success of deep learning models. However, this abundance of public-available data also raises concerns about the unauthorized exploitation of datasets for commercial purposes, which is forbidden by dataset licenses. In this paper, we propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data. By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders. This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally. Unfortunately, existing backdoor insertion methods often entail adding arbitrary and mislabeled data to the training set, leading to a significant drop in performance and easy detection by anomaly detection algorithms. To overcome this challenge, we introduce a clean-label backdoor watermarking framework that uses imperceptible perturbations to replace mislabeled samples. As a result, the watermarking samples remain consistent with the original labels, making them difficult to detect. Our experiments on text, image, and audio datasets demonstrate that the proposed framework effectively safeguards datasets with minimal impact on original task performance. We also show that adding just 1 samples can inject a traceable watermarking function and that our watermarking samples are stealthy and look benign upon visual inspection.

READ FULL TEXT

page 3

page 11

research
01/18/2020

OIAD: One-for-all Image Anomaly Detection with Disentanglement Learning

Anomaly detection aims to recognize samples with anomalous and unusual p...
research
03/20/2019

Segmentation-Based Deep-Learning Approach for Surface-Defect Detection

Automated surface-anomaly detection using machine learning has become an...
research
06/03/2022

Kallima: A Clean-label Framework for Textual Backdoor Attacks

Although Deep Neural Network (DNN) has led to unprecedented progress in ...
research
04/19/2021

Do We Really Need Gold Samples for Sample Weighting Under Label Noise?

Learning with labels noise has gained significant traction recently due ...
research
08/03/2022

Robust Learning of Deep Time Series Anomaly Detection Models with Contaminated Training Data

Time series anomaly detection (TSAD) is an important data mining task wi...
research
02/03/2022

Learnability Lock: Authorized Learnability Control Through Adversarial Invertible Transformations

Owing much to the revolution of information technology, the recent progr...
research
12/31/2022

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

There is a growing interest in developing unlearnable examples (UEs) aga...

Please sign up or login with your details

Forgot password? Click here to reset