SARS-Cov-2 RNA Sequence Classification Based on Territory Information

01/09/2021
by   Jingwei Liu, et al.
13

CovID-19 genetics analysis is critical to determine virus type,virus variant and evaluate vaccines. In this paper, SARS-Cov-2 RNA sequence analysis relative to region or territory is investigated. A uniform framework of sequence SVM model with various genetics length from short to long and mixed-bases is developed by projecting SARS-Cov-2 RNA sequence to different dimensional space, then scoring it according to the output probability of pre-trained SVM models to explore the territory or origin information of SARS-Cov-2. Different sample size ratio of training set and test set is also discussed in the data analysis. Two SARS-Cov-2 RNA classification tasks are constructed based on GISAID database, one is for mainland, Hongkong and Taiwan of China, and the other is a 6-class classification task (Africa, Asia, Europe, North American, South American& Central American, Ocean) of 7 continents. For 3-class classification of China, the Top-1 accuracy rate can reach 82.45% (train 60%, test=40%); For 2-class classification of China, the Top-1 accuracy rate can reach 97.35% (train 80%, test 20%); For 6-class classification task of world, when the ratio of training set and test set is 20% : 80% , the Top-1 accuracy rate can achieve 30.30%. And, some Top-N results are also given.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro