Ontology-Based and Weakly Supervised Rare Disease Phenotyping from Clinical Notes

by   Hang Dong, et al.

Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-based framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets of discharge summaries and radiology reports from two institutions in the US and the UK. Our best weakly supervised method achieved 81.4 recall on extracting rare disease UMLS phenotypes from MIMIC-III discharge summaries. The overall pipeline processing clinical notes can surface rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. We discuss the usefulness of the weak supervision approach and propose directions for future studies.


page 1

page 9


Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision

The identification of rare diseases from clinical notes with Natural Lan...

Classifying Unstructured Clinical Notes via Automatic Weak Supervision

Healthcare providers usually record detailed notes of the clinical care ...

Trove: Ontology-driven weak supervision for medical entity classification

Motivation: Recognizing named entities (NER) and their associated attrib...

Clinical Concept Extraction for Document-Level Coding

The text of clinical notes can be a valuable source of patient informati...

Hybrid Approaches for our Participation to the n2c2 Challenge on Cohort Selection for Clinical Trials

Objective: Natural language processing can help minimize human intervent...

Semantic rule Web-based Diagnosis and Treatment of Vector-Borne Diseases using SWRL rules

Vector-borne diseases (VBDs) are a kind of infection caused through the ...

Large Language Models Vote: Prompting for Rare Disease Identification

The emergence of generative Large Language Models (LLMs) emphasizes the ...

Please sign up or login with your details

Forgot password? Click here to reset