Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis
Multi-omic data provides multiple views of the same patients. Integrative analysis of multi-omic data is crucial to elucidate the molecular underpinning of disease etiology. However, multi-omic data has the "big p, small N" problem (the number of features is large, but the number of samples is small), it is challenging to train a complicated machine learning model from the multi-omic data alone and make it generalize well. Here we propose a framework termed Multi-view Factorization AutoEncoder with network constraints to integrate multi-omic data with domain knowledge (biological interactions networks). Our framework employs deep representation learning to learn feature embeddings and patient embeddings simultaneously, enabling us to integrate feature interaction network and patient view similarity network constraints into the training objective. The whole framework is end-to-end differentiable. We applied our approach to the TCGA Pan-cancer dataset and achieved satisfactory results to predict disease progression-free interval (PFI) and patient overall survival (OS) events. Code will be made publicly available.
READ FULL TEXT