Protein Representation Learning via Knowledge Enhanced Primary Structure Modeling

01/30/2023
by   Hong-Yu Zhou, et al.
0

Protein representation learning has primarily benefited from the remarkable development of language models (LMs). Accordingly, pre-trained protein models also suffer from a problem in LMs: a lack of factual knowledge. The recent solution models the relationships between protein and associated knowledge terms as the knowledge encoding objective. However, it fails to explore the relationships at a more granular level, i.e., the token level. To mitigate this, we propose Knowledge-exploited Auto-encoder for Protein (KeAP), which performs token-level knowledge graph exploration for protein representation learning. In practice, non-masked amino acids iteratively query the associated knowledge tokens to extract and integrate helpful information for restoring masked amino acids via attention. We show that KeAP can consistently outperform the previous counterpart on 9 representative downstream applications, sometimes surpassing it by large margins. These results suggest that KeAP provides an alternative yet effective way to perform knowledge enhanced protein representation learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Retrieved Sequence Augmentation for Protein Representation Learning

Protein language models have excelled in a variety of tasks, ranging fro...
research
01/23/2022

OntoProtein: Protein Pretraining With Gene Ontology Embedding

Self-supervised protein language models have proved their effectiveness ...
research
04/04/2022

Multi-Scale Representation Learning on Proteins

Proteins are fundamental biological entities mediating key roles in cell...
research
10/29/2021

Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model

Understanding protein sequences is vital and urgent for biology, healthc...
research
06/08/2023

Multi-level Protein Representation Learning for Blind Mutational Effect Prediction

Directed evolution plays an indispensable role in protein engineering th...
research
07/13/2020

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing

Computational biology and bioinformatics provide vast data gold-mines fr...
research
04/11/2021

Representation Learning for Networks in Biology and Medicine: Advancements, Challenges, and Opportunities

With the remarkable success of representation learning in providing powe...

Please sign up or login with your details

Forgot password? Click here to reset