Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for Inferring Online Health Texts

by   Joseph Gatto, et al.

User-generated texts available on the web and social platforms are often long and semantically challenging, making them difficult to annotate. Obtaining human annotation becomes increasingly difficult as problem domains become more specialized. For example, many health NLP problems require domain experts to be a part of the annotation pipeline. Thus, it is crucial that we develop low-resource NLP solutions able to work with this set of limited-data problems. In this study, we employ Abstract Meaning Representation (AMR) graphs as a means to model low-resource Health NLP tasks sourced from various online health resources and communities. AMRs are well suited to model online health texts as they can represent multi-sentence inputs, abstract away from complex terminology, and model long-distance relationships between co-referring tokens. AMRs thus improve the ability of pre-trained language models to reason about high-complexity texts. Our experiments show that we can improve performance on 6 low-resource health NLP tasks by augmenting text embeddings with semantic graph embeddings. Our approach is task agnostic and easy to merge into any standard text classification pipeline. We experimentally validate that AMRs are useful in the modeling of complex texts by analyzing performance through the lens of two textual complexity measures: the Flesch Kincaid Reading Level and Syntactic Complexity. Our error analysis shows that AMR-infused language models perform better on complex texts and generally show less predictive variance in the presence of changing complexity.


page 1

page 2

page 3

page 4


CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts

Social media platforms play an essential role in crisis communication, b...

ZeroBERTo – Leveraging Zero-Shot Text Classification by Topic Modeling

Traditional text classification approaches often require a good amount o...

A Challenging Benchmark for Low-Resource Learning

With promising yet saturated results in high-resource settings, low-reso...

Adapting to the Low-Resource Double-Bind: Investigating Low-Compute Methods on Low-Resource African Languages

Many natural language processing (NLP) tasks make use of massively pre-t...

Language Segmentation

Language segmentation consists in finding the boundaries where one langu...

Prague Dependency Treebank – Consolidated 1.0

We present a richly annotated and genre-diversified language resource, t...

DAST Model: Deciding About Semantic Complexity of a Text

Measuring of text complexity is a needed task in several domains and app...

Please sign up or login with your details

Forgot password? Click here to reset