Can segmentation models be trained with fully synthetically generated data?

by   Virginia Fernandez, et al.

In order to achieve good performance and generalisability, medical image segmentation models should be trained on sizeable datasets with sufficient variability. Due to ethics and governance restrictions, and the costs associated with labelling data, scientific development is often stifled, with models trained and tested on limited data. Data augmentation is often used to artificially increase the variability in the data distribution and improve model generalisability. Recent works have explored deep generative models for image synthesis, as such an approach would enable the generation of an effectively infinite amount of varied data, addressing the generalisability and data access problems. However, many proposed solutions limit the user's control over what is generated. In this work, we propose brainSPADE, a model which combines a synthetic diffusion-based label generator with a semantic image generator. Our model can produce fully synthetic brain labels on-demand, with or without pathology of interest, and then generate a corresponding MRI image of an arbitrary guided style. Experiments show that brainSPADE synthetic data can be used to train segmentation models with performance comparable to that of models trained on real data.


page 5

page 10


Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks

Data diversity is critical to success when training deep learning models...

Creating Disasters: Recession Forecasting with GAN-Generated Synthetic Time Series Data

A common problem when forecasting rare events, such as recessions, is li...

Multi-Contrast MRI Segmentation Trained on Synthetic Images

In our comprehensive experiments and evaluations, we show that it is pos...

Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models

The impressive capacity shown by recent text-to-image diffusion models t...

Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data

One of the most pressing problems in the automated analysis of historica...

Deep Generative Modeling-based Data Augmentation with Demonstration using the BFBT Benchmark Void Fraction Datasets

Deep learning (DL) has achieved remarkable successes in many disciplines...

STAN: Synthetic Network Traffic Generation using Autoregressive Neural Models

Deep learning models have achieved great success in recent years. Howeve...

Please sign up or login with your details

Forgot password? Click here to reset