LinearPartition: Linear-Time Approximation of RNA Folding Partition Function and Base Pairing Probabilities

12/31/2019
by   He Zhang, et al.
0

RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy (MFE) methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classic partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. This slowness is even more severe than cubic-time MFE-based methods due to a larger constant factor in runtime. Inspired by the success of LinearFold algorithm that computes the MFE structure in linear time, we address this issue by proposing a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities. LinearPartition is 256x faster than Vienna RNAfold for a sequence with length 15,780, and 2,771x faster than CONTRAfold for a sequence with length 32,753. Interestingly, although LinearPartition is approximate, it runs in linear time without sacrificing accuracy when base pair probabilities are used to assemble structures, and even leads to a small accuracy improvement on longer families (16S and 23S rRNA).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset