Refining interaction search through signed iterative Random Forests

by   Karl Kumbier, et al.

Advances in supervised learning have enabled accurate prediction in biological systems governed by complex interactions among biomolecules. However, state-of-the-art predictive algorithms are typically black-boxes, learning statistical interactions that are difficult to translate into testable hypotheses. The iterative Random Forest algorithm took a step towards bridging this gap by providing a computationally tractable procedure to identify the stable, high-order feature interactions that drive the predictive accuracy of Random Forests (RF). Here we refine the interactions identified by iRF to explicitly map responses as a function of interacting features. Our method, signed iRF, describes subsets of rules that frequently occur on RF decision paths. We refer to these rule subsets as signed interactions. Signed interactions share not only the same set of interacting features but also exhibit similar thresholding behavior, and thus describe a consistent functional relationship between interacting features and responses. We describe stable and predictive importance metrics to rank signed interactions. For each SPIM, we define null importance metrics that characterize its expected behavior under known structure. We evaluate our proposed approach in biologically inspired simulations and two case studies: predicting enhancer activity and spatial gene expression patterns. In the case of enhancer activity, s-iRF recovers one of the few experimentally validated high-order interactions and suggests novel enhancer elements where this interaction may be active. In the case of spatial gene expression patterns, s-iRF recovers all 11 reported links in the gap gene network. By refining the process of interaction recovery, our approach has the potential to guide mechanistic inquiry into systems whose scale and complexity is beyond human comprehension.


page 2

page 5

page 7

page 17

page 18


Iterative Random Forests to detect predictive and stable high-order interactions

Genomics has revolutionized biology, enabling the interrogation of whole...

Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests

Random Forests (RF) are at the cutting edge of supervised machine learni...

Queues on interacting networks

Interacting networks are different in nature to single networks. The stu...

How complex is the microarray dataset? A novel data complexity metric for biological high-dimensional microarray data

Data complexity analysis quantifies the hardness of constructing a predi...

JigSaw: A tool for discovering explanatory high-order interactions from random forests

Machine learning is revolutionizing biology by facilitating the predicti...

Interpretable Random Forests via Rule Extraction

We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a...

Empowering individual trait prediction using interactions

One component of precision medicine is to construct prediction models wi...

Please sign up or login with your details

Forgot password? Click here to reset