Abusive and Threatening Language Detection in Urdu using Boosting based and BERT based models: A Comparative Approach

by   Mithun Das, et al.

Online hatred is a growing concern on many social media platforms. To address this issue, different social media platforms have introduced moderation policies for such content. They also employ moderators who can check the posts violating moderation policies and take appropriate action. Academicians in the abusive language research domain also perform various studies to detect such content better. Although there is extensive research in abusive language detection in English, there is a lacuna in abusive language detection in low resource languages like Hindi, Urdu etc. In this FIRE 2021 shared task - "HASOC- Abusive and Threatening language detection in Urdu" the organizers propose an abusive language detection dataset in Urdu along with threatening language detection. In this paper, we explored several machine learning models such as XGboost, LGBM, m-BERT based models for abusive and threatening content detection in Urdu based on the shared task. We observed the Transformer model specifically trained on abusive language dataset in Arabic helps in getting the best performance. Our model came First for both abusive and threatening content detection with an F1scoreof 0.88 and 0.54, respectively.


page 1

page 2

page 3

page 4


HateMonitors: Language Agnostic Abuse Detection in Social Media

Reducing hateful and offensive content in online social media pose a dua...

Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

The social media platform is a convenient medium to express personal tho...

Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild

Social media platforms struggle to protect users from harmful content th...

Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection

Social media often acts as breeding grounds for different forms of offen...

RAFT: Rationale adaptor for few-shot abusive language detection

Abusive language is a concerning problem in online social media. Past re...

A Simple Voting Mechanism for Online Sexist Content Identification

This paper presents the participation of the MiniTrue team in the EXIST ...

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

A crucial aspect of a rumor detection model is its ability to generalize...

Please sign up or login with your details

Forgot password? Click here to reset