Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors

by   Chandler Squires, et al.

We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define latent factor causal models (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the observed variables to latent variables. LFCMs are motivated by gene regulatory networks, where regulatory edges, corresponding to transcription factors, connect spatially clustered genes. We show identifiability results on this model and design a consistent three-stage algorithm that discovers clusters of observed nodes, a partial ordering over clusters, and finally, the entire structure over both observed and latent nodes. We evaluate our method in a synthetic setting, demonstrating its ability to almost perfectly recover the ground truth clustering even at relatively low sample sizes, as well as the ability to recover a significant number of the edges from observed variables to latent factors. Finally, we apply our method in a semi-synthetic setting to protein mass spectrometry data with a known ground truth network, and achieve almost perfect recovery of the ground truth variable clusters.


page 1

page 2

page 3

page 4


Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables

We consider the problem of learning causal models from observational dat...

Fast and reliable inference algorithm for hierarchical stochastic block models

Network clustering reveals the organization of a network or correspondin...

Partial Disentanglement via Mechanism Sparsity

Disentanglement via mechanism sparsity was introduced recently as a prin...

Learning to Recover Causal Relationship from Indefinite Data in the Presence of Latent Confounders

In Causal Discovery with latent variables, We define two data paradigms:...

Benefits of Overparameterization in Single-Layer Latent Variable Generative Models

One of the most surprising and exciting discoveries in supervising learn...

Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection

Disentanglement aims to recover meaningful latent ground-truth factors f...

Information-Ordered Bottlenecks for Adaptive Semantic Compression

We present the information-ordered bottleneck (IOB), a neural layer desi...

Please sign up or login with your details

Forgot password? Click here to reset