Self-alignment Pre-training for Biomedical Entity Representations
Despite the widespread success of self-supervised learning via masked language models, learning representations directly from text to accurately capture complex and fine-grained semantic relationships in the biomedical domain remains as a challenge. Addressing this is of paramount importance for tasks such as entity linking where complex relational knowledge is pivotal. We propose SapBERT, a pre-training scheme based on BERT. It self-aligns the representation space of biomedical entities with a metric learning objective function leveraging UMLS, a collection of biomedical ontologies with >4M concepts. Our experimental results on six medical entity linking benchmarking datasets demonstrate that SapBERT outperforms many domain-specific BERT-based variants such as BioBERT, BlueBERT and PubMedBERT, achieving the state-of-the-art (SOTA) performances.
READ FULL TEXT