On the Usefulness of Synthetic Tabular Data Generation

by   Dionysis Manousakas, et al.

Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning (ML) training. Privacy-preserving synthetic data generation can accelerate data exchange for downstream tasks, but there is not enough evidence to show how or why synthetic data can boost ML training. In this study, we benchmarked ML performance using synthetic tabular data for four use cases: data sharing, data augmentation, class balancing, and data summarization. We observed marginal improvements for the balancing use case on some datasets. However, we conclude that there is not enough evidence to claim that synthetic tabular data is useful for ML training.


page 1

page 2

page 3

page 4


Synthetic Data for Model Selection

Recent improvements in synthetic data generation make it possible to pro...

A supervised generative optimization approach for tabular data

Synthetic data generation has emerged as a crucial topic for financial i...

Synthetic data, real errors: how (not) to publish and use synthetic data

Generating synthetic data through generative models is gaining interest ...

Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic Data

Background: At the onset of a pandemic, such as COVID-19, data with prop...

Training Data Augmentation for Deep Learning RF Systems

Applications of machine learning are subject to three major components t...

Synthcity: facilitating innovative use cases of synthetic data in different data modalities

Synthcity is an open-source software package for innovative use cases of...

Conditional Synthetic Data Generation for Personal Thermal Comfort Models

Personal thermal comfort models aim to predict an individual's thermal c...

Please sign up or login with your details

Forgot password? Click here to reset