LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules
Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA-RNA interaction is useful. Some existing tools are less accurate due to omitting the competing of intermolecular and intramolecular base pairs, or focus more on predicting the binding region rather than predicting the complete secondary structure of two interacting strands. Vienna RNAcofold, which reduces the problem into the classical single sequence folding by concatenating two strands, scales in cubic time against the combined sequence length, and is slow for long sequences. To address these issues, we present LinearCoFold, which predicts the complete minimum free energy structure of two strands in linear runtime, and LinearCoPartition, which calculates the cofolding partition function and base pairing probabilities in linear runtime. LinearCoFold and LinearCoPartition follows the concatenation strategy of RNAcofold, but are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8x faster than RNAcofold MFE mode (0.6 minutes vs. 52.1 minutes), and LinearCoPartition is 642.3x faster than RNAcofold partition function mode (1.8 minutes vs. 1156.2 minutes). Different from the local algorithms, LinearCoFold and LinearCoPartition are global cofolding algorithms without restriction on base pair length. Surprisingly, LinearCoFold and LinearCoPartition's predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA-RNA interaction between SARS-CoV-2 gRNA and human U4 snRNA, which has been experimentally studied, and observe that LinearCoFold's prediction correlates better to the wet lab results.
READ FULL TEXT