Scope of Pre-trained Language Models for Detecting Conflicting Health Information

by   Joseph Gatto, et al.

An increasing number of people now rely on online platforms to meet their health information needs. Thus identifying inconsistent or conflicting textual health information has become a safety-critical task. Health advice data poses a unique challenge where information that is accurate in the context of one diagnosis can be conflicting in the context of another. For example, people suffering from diabetes and hypertension often receive conflicting health advice on diet. This motivates the need for technologies which can provide contextualized, user-specific health advice. A crucial step towards contextualized advice is the ability to compare health advice statements and detect if and how they are conflicting. This is the task of health conflict detection (HCD). Given two pieces of health advice, the goal of HCD is to detect and categorize the type of conflict. It is a challenging task, as (i) automatically identifying and categorizing conflicts requires a deeper understanding of the semantics of the text, and (ii) the amount of available data is quite limited. In this study, we are the first to explore HCD in the context of pre-trained language models. We find that DeBERTa-v3 performs best with a mean F1 score of 0.68 across all experiments. We additionally investigate the challenges posed by different conflict types and how synthetic data improves a model's understanding of conflict-specific semantics. Finally, we highlight the difficulty in collecting real health conflicts and propose a human-in-the-loop synthetic data augmentation approach to expand existing HCD datasets. Our HCD training dataset is over 2x bigger than the existing HCD dataset and is made publicly available on Github.


UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language

Patronizing and condescending language (PCL) is everywhere, but rarely i...

HealthE: Classifying Entities in Online Textual Health Advice

The processing of entities in natural language is essential to many medi...

UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

Toxicity is pervasive in social media and poses a major threat to the he...

CAiRE in DialDoc21: Data Augmentation for Information-Seeking Dialogue System

Information-seeking dialogue systems, including knowledge identification...

Using Deep Mixture-of-Experts to Detect Word Meaning Shift for TempoWiC

This paper mainly describes the dma submission to the TempoWiC task, whi...

HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and Side-Information for Multi-Level Sexism Classification

We present the findings of our participation in the SemEval-2023 Task 10...

CliMedBERT: A Pre-trained Language Model for Climate and Health-related Text

Climate change is threatening human health in unprecedented orders and m...

Please sign up or login with your details

Forgot password? Click here to reset