Interpretable, similarity-driven multi-view embeddings from high-dimensional biomedical data
Inter-modality covariation leveraged as a scientific principle can inform the development of novel hypotheses and increase statistical power in the analysis of diverse data. We present similarity-driven multi-view linear reconstruction (SiMLR), an algorithm that exploits inter-modality relationships to transform large scientific datasets into smaller, more well-powered and intepretable low-dimensional spaces. Novel aspects of this methodology include its objective function for identifying joint signal, an efficient approach based on sparse matrices for representing prior within-modality relationships and an efficient implementation that allows SiMLR to be applied to relatively large datasets with multiple modalities, each of which may have millions of entries. We first describe and contextualize SiMLR theory and implementation strategies. We then illustrate the method in simulated data to establish its expected performance. Subsequently, we demonstrate succinct SiMLR case studies, and compare with related methods, in publicly accessible example datasets. Lastly, we use SiMLR to derive a neurobiological embedding from three types of measurements - two measurements from structural neuroimaging complemented by single nucleotide polymorphisms (SNPs) from 44 depression and anxiety-related loci. We find that, in a validation dataset, the low-dimensional space from the training set exhibits above-chance relationships with clinical measurements of anxiety and, to a lesser degree, depression. The results suggest that SiMLR is able to derive a low-dimensional representation space that, in suitable datasets, may be clinically relevant. Taken together, this collection of results shows that SiMLR may be applied with default parameters to joint signal estimation from disparate modalities and may yield practically useful results.
READ FULL TEXT