Multiple Genome Analytics Framework: The Case of All SARS-CoV-2 Complete Variants

Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The SARS-CoV-2 pandemic has made such problems more demanding with hundreds or thousands of new genome variants discovered every week, because of constant mutations, and there is a desperate need for fast and accurate analyses. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and pattern detection, that can help to efficiently address several computational biology and bioinformatics problems concurrently with minimal resources. A single execution of advanced algorithms, with space and time complexity O(nlogn), is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used from other meta-algorithms for further meta-analyses. The potential of the proposed framework is demonstrated with the analysis of more than 300,000 SARS-CoV-2 genome sequences and the detection of all repeated patterns with length up to 60 nucleotides in these sequences. These results have been used to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.

READ FULL TEXT

page 1

page 9

page 12

research
07/24/2019

Exhaustive Exact String Matching: The Analysis of the Full Human Genome

Exact string matching has been a fundamental problem in computer science...
research
11/02/2021

Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design

Genome sequence analysis plays a pivotal role in enabling many medical a...
research
08/27/2015

Nucleosome positioning: resources and tools online

Nucleosome positioning is an important process required for proper genom...
research
01/03/2021

Segmentation and genome annotation algorithms

Segmentation and genome annotation (SAGA) algorithms are widely used to ...
research
06/07/2022

Fast Exact String to D-Texts Alignments

In recent years, aligning a sequence to a pangenome has become a central...
research
03/06/2022

An Interactive Gameplay to Crowdsource Multiple Sequence Alignment of Genome Sequences: Genenigma

Comparative genomics is a field of research that compares genomes of dif...
research
06/29/2022

LinearAlifold: Linear-Time Consensus Structure Prediction for RNA Alignments

Predicting the consensus structure of a set of aligned RNA homologs is a...

Please sign up or login with your details

Forgot password? Click here to reset