Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

06/05/2022
by   Xin Lian, et al.
0

With the epidemic continuing, hatred against Asians is intensifying in countries outside Asia, especially among the Chinese. Thus, there is an urgent need to detect and prevent hate speech toward Asians effectively. In this work, we first create COVID-HATE-2022, an annotated dataset that is an extension of the anti-Asian hate speech dataset on Twitter, including 2,035 annotated tweets fetched in early February 2022, which are labeled based on specific criteria, and we present the comprehensive collection of scenarios of hate and non-hate tweets in the dataset. Second, we fine-tune the BERT models based on the relevant datasets, and demonstrate strategies including 1) cleaning the hashtags, usernames being @, URLs, and emojis before the fine-tuning process, and 2) training with the data while validating with the "clean" data (and the opposite) are not effective for improving performance. Third, we investigate the performance of advanced fine-tuning strategies with 1) model-centric approaches, such as discriminative fine-tuning, gradual unfreezing, and warmup steps, and 2) data-centric approaches, which incorporate data trimming and data augmenting, and show that both strategies generally improve the performance, while data-centric ones outperform the others, which demonstrate the feasibility and effectiveness of the data-centric approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2021

Fine-Tuning Transformers for Identifying Self-Reporting Potential Cases and Symptoms of COVID-19 in Tweets

We describe our straight-forward approach for Tasks 5 and 6 of 2021 Soci...
research
09/12/2023

Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

Search methods based on Pretrained Language Models (PLM) have demonstrat...
research
09/06/2023

Offensive Hebrew Corpus and Detection using BERT

Offensive language detection has been well studied in many languages, bu...
research
06/10/2020

Revisiting Few-sample BERT Fine-tuning

We study the problem of few-sample fine-tuning of BERT contextual repres...
research
10/23/2020

Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets

In this paper, we present various pre-training strategies that aid in im...
research
07/12/2021

Hate versus Politics: Detection of Hate against Policy makers in Italian tweets

Accurate detection of hate speech against politicians, policy making and...

Please sign up or login with your details

Forgot password? Click here to reset