Plausible Deniability for Privacy-Preserving Data Synthesis

by   Vincent Bindschaedler, et al.

Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: an output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.


page 1

page 2

page 3

page 4


Private sampling: a noiseless approach for generating differentially private synthetic data

In a world where artificial intelligence and data science become omnipre...

MC-GEN:Multi-level Clustering for Private Synthetic Data Generation

Nowadays, machine learning is one of the most common technology to turn ...

P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

How can we release a massive volume of sensitive data while mitigating p...

Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders

Synthetic data has been hailed as the silver bullet for privacy preservi...

DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization

A large amount of high-dimensional and heterogeneous data appear in prac...

Differentially Private Synthetic Heavy-tailed Data

The U.S. Census Longitudinal Business Database (LBD) product contains em...

Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking

Case records on identified victims of human trafficking are highly sensi...

Please sign up or login with your details

Forgot password? Click here to reset