Time Will Change Things: An Empirical Study on Dynamic Language Understanding in Social Media Classification

10/06/2022
by   Yuji Zhang, et al.
0

Language features are ever-evolving in the real-world social media environment. Many trained models in natural language understanding (NLU), ineffective in semantic inference for unseen features, might consequently struggle with the deteriorating performance in dynamicity. To address this challenge, we empirically study social media NLU in a dynamic setup, where models are trained on the past data and test on the future. It better reflects the realistic practice compared to the commonly-adopted static setup of random data split. To further analyze model adaption to the dynamicity, we explore the usefulness of leveraging some unlabeled data created after a model is trained. The performance of unsupervised domain adaption baselines based on auto-encoding and pseudo-labeling and a joint framework coupling them both are examined in the experiments. Substantial results on four social media tasks imply the universally negative effects of evolving environments over classification accuracy, while auto-encoding and pseudo-labeling collaboratively show the best robustness in dynamicity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2023

SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding

We study the ability of transformer-based language models (LMs) to under...
research
02/27/2019

When a Tweet is Actually Sexist. A more Comprehensive Classification of Different Online Harassment Categories and The Challenges in NLP

Sexism is very common in social media and makes the boundaries of freedo...
research
03/27/2023

Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification

Social media is daily creating massive multimedia content with paired im...
research
09/11/2020

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

Although Indonesian is known to be the fourth most frequently used langu...
research
01/31/2023

Friend-training: Learning from Models of Different but Related Tasks

Current self-training methods such as standard self-training, co-trainin...
research
08/19/2023

HICL: Hashtag-Driven In-Context Learning for Social Media Natural Language Understanding

Natural language understanding (NLU) is integral to various social media...
research
02/06/2023

It's about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits

New events emerge over time influencing the topics of rumors in social m...

Please sign up or login with your details

Forgot password? Click here to reset