Detect and Correct Bias in Multi-Site Neuroimaging Datasets

02/12/2020
by   Christian Wachinger, et al.
0

The desire to train complex machine learning algorithms and to increase the statistical power in association studies drives neuroimaging research to use ever-larger datasets. The most obvious way to increase sample size is by pooling scans from independent studies. However, simple pooling is often ill-advised as selection, measurement, and confounding biases may creep in and yield spurious correlations. In this work, we combine 35,320 magnetic resonance images of the brain from 17 studies to examine bias in neuroimaging. In the first experiment, Name That Dataset, we provide empirical evidence for the presence of bias by showing that scans can be correctly assigned to their respective dataset with 71.5 look at confounding bias, which is often viewed as the main shortcoming in observational studies. In practice, we neither know all potential confounders nor do we have data on them. Hence, we model confounders as unknown, latent variables. Kolmogorov complexity is then used to decide whether the confounded or the causal model provides the simplest factorization of the graphical model. Finally, we present methods for dataset harmonization and study their ability to remove bias in imaging features. In particular, we propose an extension of the recently introduced ComBat algorithm to control for global variation across image features, inspired by adjusting for population stratification in genetics. Our results demonstrate that harmonization can reduce dataset-specific information in image features. Further, confounding bias can be reduced and even turned into a causal relationship. However, harmonziation also requires caution as it can easily remove relevant subject-specific information.

READ FULL TEXT
research
07/09/2019

Quantifying Confounding Bias in Neuroimaging Datasets with Causal Inference

Neuroimaging datasets keep growing in size to address increasingly compl...
research
05/27/2022

Combining observational datasets from multiple environments to detect hidden confounding

A common assumption in causal inference from observational data is the a...
research
04/28/2018

Detect, Quantify, and Incorporate Dataset Bias: A Neuroimaging Analysis on 12,207 Individuals

Neuroimaging datasets keep growing in size to address increasingly compl...
research
12/01/2017

Causal inference taking into account unobserved confounding

Causal inference with observational data can be performed under an assum...
research
05/01/2020

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates

Many applications of computational social science aim to infer causal co...
research
10/10/2019

Machine Learning with Multi-Site Imaging Data: An Empirical Study on the Impact of Scanner Effects

This is an empirical study to investigate the impact of scanner effects ...
research
04/20/2021

Hidden Biases in Unreliable News Detection Datasets

Automatic unreliable news detection is a research problem with great pot...

Please sign up or login with your details

Forgot password? Click here to reset