On the Privacy Effect of Data Enhancement via the Lens of Memorization

08/17/2022
by   Xiao Li, et al.
13

Machine learning poses severe privacy concerns as it is shown that the learned models can reveal sensitive information about their training data. Many works have investigated the effect of widely-adopted data augmentation (DA) and adversarial training (AT) techniques, termed data enhancement in the paper, on the privacy leakage of machine learning models. Such privacy effects are often measured by membership inference attacks (MIAs), which aim to identify whether a particular example belongs to the training set or not. We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks as members compared to samples with low privacy risks. To solve this problem, we deploy a recent attack that can capture the memorization degrees of individual samples for evaluation. Through extensive experiments, we unveil non-trivial findings about the connections between three important properties of machine learning models, including privacy, generalization gap, and adversarial robustness. We demonstrate that, unlike existing results, the generalization gap is shown not highly correlated with privacy leakage. Moreover, stronger adversarial robustness does not necessarily imply that the model is more susceptible to privacy attacks.

READ FULL TEXT
research
03/24/2020

Systematic Evaluation of Privacy Risks of Machine Learning Models

Machine learning models are prone to memorizing sensitive data, making t...
research
10/07/2021

The Connection between Out-of-Distribution Generalization and Privacy of ML Models

With the goal of generalizing to out-of-distribution (OOD) data, recent ...
research
06/21/2022

The Privacy Onion Effect: Memorization is Relative

Machine learning models trained on private datasets have been shown to l...
research
03/17/2022

Leveraging Adversarial Examples to Quantify Membership Information Leakage

The use of personal data for training machine learning systems comes wit...
research
02/08/2021

Quantifying and Mitigating Privacy Risks of Contrastive Learning

Data is the key factor to drive the development of machine learning (ML)...
research
06/08/2020

Provable trade-offs between private robust machine learning

Historically, machine learning methods have not been designed with secur...
research
05/12/2023

Comparison of machine learning models applied on anonymized data with different techniques

Anonymization techniques based on obfuscating the quasi-identifiers by m...

Please sign up or login with your details

Forgot password? Click here to reset