Estimation of the number of spikes using a generalized spike population model and application to RNA-seq data
Although a generalized spike population model has been actively studied in random matrix theory, its application to real data has been rarely explored. We find that most methods for determining the number of spikes based on the Johnstone's spike population model choose far too many spikes in RNA-seq gene expression data or often fail to determine the number of spikes by indicating that all components are spikes. In this paper, we propose a new algorithm for the estimation of the number of spikes based on a generalized spike population model. Also, we suggest a new noise model for RNA-seq data based on population spectral distribution ideas, which provides a biologically reasonable number of spikes using the proposed algorithm. Furthermore, we propose a graphical tool for assessing the performance of the underlying noise model.
READ FULL TEXT