Trawling for Trolling: A Dataset

08/02/2020
by   Hitkul, et al.
0

The ability to accurately detect and filter offensive content automatically is important to ensure a rich and diverse digital discourse. Trolling is a type of hurtful or offensive content that is prevalent in social media, but is underrepresented in datasets for offensive content detection. In this work, we present a dataset that models trolling as a subcategory of offensive content. The dataset was created by collecting samples from well-known datasets and reannotating them along precise definitions of different categories of offensive content. The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech. It encompasses content from Twitter, Reddit and Wikipedia Talk Pages. Models trained on our dataset show appreciable performance without any significant hyperparameter tuning and can potentially learn meaningful linguistic information effectively. We find that these models are sensitive to data ablation which suggests that the dataset is largely devoid of spurious statistical artefacts that could otherwise distract and confuse classification models.

READ FULL TEXT

page 6

page 8

research
05/23/2021

DepressionNet: A Novel Summarization Boosted Deep Framework for Depression Detection on Social Media

Twitter is currently a popular online social media platform which allows...
research
03/16/2020

Offensive Language Identification in Greek

As offensive language has become a rising issue for online communities a...
research
03/25/2022

L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models

Social media platforms are used by a large number of people prominently ...
research
09/28/2019

Attention-based method for categorizing different types of online harassment language

In the era of social media and networking platforms, Twitter has been do...
research
01/31/2023

Automated Sentiment and Hate Speech Analysis of Facebook Data by Employing Multilingual Transformer Models

In recent years, there has been a heightened consensus within academia a...
research
01/15/2020

Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations

With the ever-increasing cases of hate spread on social media platforms,...
research
09/20/2023

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

A crucial aspect of a rumor detection model is its ability to generalize...

Please sign up or login with your details

Forgot password? Click here to reset