PreFair: Privately Generating Justifiably Fair Synthetic Data

12/20/2022
by   David Pujol, et al.
0

When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, endering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.

READ FULL TEXT
research
07/07/2023

Programmable Synthetic Tabular Data Generation

Large amounts of tabular data remain underutilized due to privacy, data ...
research
09/15/2023

DP-PQD: Privately Detecting Per-Query Gaps In Synthetic Data Generated By Black-Box Mechanisms

Synthetic data generation methods, and in particular, private synthetic ...
research
06/30/2023

FFPDG: Fast, Fair and Private Data Generation

Generative modeling has been used frequently in synthetic data generatio...
research
05/28/2022

Noise-Aware Statistical Inference with Differentially Private Synthetic Data

While generation of synthetic data under differential privacy (DP) has r...
research
10/25/2021

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Machine learning models have been criticized for reflecting unfair biase...
research
06/03/2020

One Step to Efficient Synthetic Data

We propose a general method of producing synthetic data, which is widely...
research
04/07/2021

Representative Fair Synthetic Data

Algorithms learn rules and associations based on the training data that ...

Please sign up or login with your details

Forgot password? Click here to reset