Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast
Causal gene networks model the flow of information within a cell, but reconstructing them from omics data is challenging because correlation does not imply causation. Combining genomics and transcriptomics data from a segregating population allows to orient the direction of causality between gene expression traits using genomic variants. Instrumental-variable methods (IV) use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods (ME) additionally require that distal eQTL associations are mediated by the source gene. Here we used Findr, a software providing uniform implementations of IV, ME, and coexpression-based methods, a recent dataset of 1,012 segregants from a cross between two budding yeast strains, and the YEASTRACT database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of ME decreases at large sample sizes, due to a loss of sensitivity when residual correlations become significant. IV methods contain false positive predictions, due to genomic linkage between eQTL instruments. IV and ME methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. IV methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas ME failed due to Stb5p auto-regulating its own expression. ME suggests a new candidate gene, DNM1, for a hotspot on Chr XII, where IV methods could not distinguish between multiple genes located within the hotspot.
READ FULL TEXT