Deep neural network improves the estimation of polygenic risk scores for breast cancer
Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA and LDpred. In the test cohort with 50 the receiver operating characteristic Curve (AUC) were 67.4 BLUP, 64.5 generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bi-modal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case sub-population with an average PRS significantly higher than the control population and a normal-genetic-risk case sub-population with an average PRS similar to the control population. This allowed DNN to achieve 18.8 be extrapolated to 65.4 12 that were assigned insignificant p-values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through non-linear relationships.
READ FULL TEXT