The divergence time of protein structures modelled by Markov matrices and its relation to the divergence of sequences

by   Sandun Rajapaksa, et al.

A complete time-parameterized statistical model quantifying the divergent evolution of protein structures in terms of the patterns of conservation of their secondary structures is inferred from a large collection of protein 3D structure alignments. This provides a better alternative to time-parameterized sequence-based models of protein relatedness, that have clear limitations dealing with twilight and midnight zones of sequence relationships. Since protein structures are far more conserved due to the selection pressure directly placed on their function, divergence time estimates can be more accurate when inferred from structures. We use the Bayesian and information-theoretic framework of Minimum Message Length to infer a time-parameterized stochastic matrix (accounting for perturbed structural states of related residues) and associated Dirichlet models (accounting for insertions and deletions during the evolution of protein domains). These are used in concert to estimate the Markov time of divergence of tertiary structures, a task previously only possible using proxies (like RMSD). By analyzing one million pairs of homologous structures, we yield a relationship between the Markov divergence time of structures and of sequences. Using these inferred models and the relationship between the divergence of sequences and structures, we demonstrate a competitive performance in secondary structure prediction against neural network architectures commonly employed for this task. The source code and supplementary information are downloadable from <>.


Bridging the Gaps in Statistical Models of Protein Alignment

This work demonstrates how a complete statistical model quantifying the ...

Network-based protein structural classification

Experimental determination of protein function is resource-consuming. As...

MCP: a Multi-Component learning machine to Predict protein secondary structure

The Gene or DNA sequence in every cell does not control genetic properti...

MUFold-SS: Protein Secondary Structure Prediction Using Deep Inception-Inside-Inception Networks

Motivation: Protein secondary structure prediction can provide important...

Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence

SARS-CoV-2 is an upper respiratory system RNA virus that has caused over...

Toroidal diffusions and protein structure evolution

This chapter shows how toroidal diffusions are convenient methodological...

On the Robustness of AlphaFold: A COVID-19 Case Study

Protein folding neural networks (PFNNs) such as AlphaFold predict remark...

Please sign up or login with your details

Forgot password? Click here to reset