The Healthy States of America: Creating a Health Taxonomy with Social Media

by   Sanja Šćepanović, et al.

Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalence of 18 conditions.


page 1

page 2

page 3

page 4


Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Migraine is a high-prevalence and disabling neurological disorder. Howev...

Humane Visual AI: Telling the Stories Behind a Medical Condition

A biological understanding is key for managing medical conditions, yet p...

Determining Health Utilities through Data Mining of Social Media

'Health utilities' measure patient preferences for perfect health compar...

Correlating Twitter Language with Community-Level Health Outcomes

We study how language on social media is linked to diseases such as athe...

Disease Identification From Unstructured User Input

The increasing number of Internet users leads to the rapid popularizatio...

Did You Really Just Have a Heart Attack? Towards Robust Detection of Personal Health Mentions in Social Media

Millions of users share their experiences on social media sites, such as...

Point systems in Games for Health: A bibliometric scoping study

Very few details about point systems used in games for health are report...

Please sign up or login with your details

Forgot password? Click here to reset