Machine Learning based Prediction of Hierarchical Classification of Transposable Elements
Transposable Elements (TEs) or jumping genes are the DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. Thus, the proper classification of the identified jumping genes is essential to understand their genetic and evolutionary effects in the genome. While computational methods have been developed that perform either binary classification or multi-label classification of TEs, few studies have focused on their hierarchical classification. The state-of-the-art machine learning classification method utilizes a Multi-Layer Perceptron (MLP), a class of neural network, for hierarchical classification of TEs. However, the existing methods have limited accuracy in classifying TEs. A more effective classifier, which can explain the role of TEs in germline and somatic evolution, is needed. In this study, we examine the performance of a variety of machine learning (ML) methods. And eventually, propose a robust approach for the hierarchical classification of TEs, with higher accuracy, using Support Vector Machines (SVM).
READ FULL TEXT