Clinical Named Entity Recognition using Contextualized Token Representations

by   Yichao Zhou, et al.

The clinical named entity recognition (CNER) task seeks to locate and classify clinical terminologies into predefined categories, such as diagnostic procedure, disease disorder, severity, medication, medication dosage, and sign symptom. CNER facilitates the study of side-effect on medications including identification of novel phenomena and human-focused information extraction. Existing approaches in extracting the entities of interests focus on using static word embeddings to represent each word. However, one word can have different interpretations that depend on the context of the sentences. Evidently, static word embeddings are insufficient to integrate the diverse interpretation of a word. To overcome this challenge, the technique of contextualized word embedding has been introduced to better capture the semantic meaning of each word based on its context. Two of these language models, ELMo and Flair, have been widely used in the field of Natural Language Processing to generate the contextualized word embeddings on domain-generic documents. However, these embeddings are usually too general to capture the proximity among vocabularies of specific domains. To facilitate various downstream applications using clinical case reports (CCRs), we pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair) using the clinical-related corpus from the PubMed Central. Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.


page 1

page 2

page 3

page 4


GNTeam at 2018 n2c2: Feature-augmented BiLSTM-CRF for drug-related entity recognition in hospital discharge summaries

Monitoring the administration of drugs and adverse drug reactions are ke...

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Word embeddings, i.e., low-dimensional vector representations such as Gl...

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Contextual word embedding models, such as BioBERT and Bio_ClinicalBERT, ...

Second-Order Word Embeddings from Nearest Neighbor Topological Features

We introduce second-order vector representations of words, induced from ...

A Simple and Efficient Probabilistic Language model for Code-Mixed Text

The conventional natural language processing approaches are not accustom...

On the effectiveness of feature set augmentation using clusters of word embeddings

Word clusters have been empirically shown to offer important performance...

Using Holographically Compressed Embeddings in Question Answering

Word vector representations are central to deep learning natural languag...

Please sign up or login with your details

Forgot password? Click here to reset