Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model

by   Usman Naseem, et al.

A user-generated text on social media enables health workers to keep track of information, identify possible outbreaks, forecast disease trends, monitor emergency cases, and ascertain disease awareness and response to official health correspondence. This exchange of health information on social media has been regarded as an attempt to enhance public health surveillance (PHS). Despite its potential, the technology is still in its early stages and is not ready for widespread application. Advancements in pretrained language models (PLMs) have facilitated the development of several domain-specific PLMs and a variety of downstream applications. However, there are no PLMs for social media tasks involving PHS. We present and release PHS-BERT, a transformer-based PLM, to identify tasks related to public health surveillance on social media. We compared and benchmarked the performance of PHS-BERT on 25 datasets from different social medial platforms related to 7 different PHS tasks. Compared with existing PLMs that are mainly evaluated on limited tasks, PHS-BERT achieved state-of-the-art performance on all 25 tested datasets, showing that our PLM is robust and generalizable in the common PHS tasks. By making PHS-BERT available, we aim to facilitate the community to reduce the computational cost and introduce new baselines for future works across various PHS-related tasks.


page 1

page 2

page 3

page 4


UQ at #SMM4H 2023: ALEX for Public Health Analysis with Social Media

As social media becomes increasingly popular, more and more activities r...

Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort

In the last decade drug overdose deaths reached staggering proportions i...

Deceptiveness of internet data for disease surveillance

Quantifying how many people are or will be sick, and where, is a critica...

Balanced and Explainable Social Media Analysis for Public Health with Large Language Models

As social media becomes increasingly popular, more and more public healt...

Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

Time-critical analysis of social media streams is important for humanita...

When Cyber Aggression Prediction Meets BERT on Social Media

Increasingly, cyber aggression becomes the prevalent phenomenon that ero...

Incorporating Emotions into Health Mention Classification Task on Social Media

The health mention classification (HMC) task is the process of identifyi...

Please sign up or login with your details

Forgot password? Click here to reset