Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

07/27/2021
by   Sheikh Muhammad Sarwar, et al.
0

Online harassment in the form of hate speech has been on the rise in recent years. Addressing the issue requires a combination of content moderation by people, aided by automatic detection methods. As content moderation is itself harmful to the people doing it, we desire to reduce the burden by improving the automatic detection of hate speech. Hate speech presents a challenge as it is directed at different target groups using a completely different vocabulary. Further the authors of the hate speech are incentivized to disguise their behavior to avoid being removed from a platform. This makes it difficult to develop a comprehensive data set for training and evaluating hate speech detection models because the examples that represent one hate speech domain do not typically represent others, even within the same language or culture. We propose an unsupervised domain adaptation approach to augment labeled data for hate speech detection. We evaluate the approach with three different models (character CNNs, BiLSTMs and BERT) on three different collections. We show our approach improves Area under the Precision/Recall curve by as much as 42 recall by as much as 278 in precision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2017

Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

Domain mismatch between training and testing can lead to significant deg...
research
08/04/2021

Unsupervised Domain Adaptation in Speech Recognition using Phonetic Features

Automatic speech recognition is a difficult problem in pattern recogniti...
research
10/18/2022

Simple and Effective Unsupervised Speech Translation

The amount of labeled data to train models for speech tasks is limited f...
research
08/04/2023

From Fake to Hyperpartisan News Detection Using Domain Adaptation

Unsupervised Domain Adaptation (UDA) is a popular technique that aims to...
research
09/06/2020

Libri-Adapt: A New Speech Dataset for Unsupervised Domain Adaptation

This paper introduces a new dataset, Libri-Adapt, to support unsupervise...
research
05/23/2023

How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

Due to the broad range of social media platforms and their user groups, ...
research
03/11/2019

Un duel probabiliste pour départager deux présidents (LIA @ DEFT'2005)

We present a set of probabilistic models applied to binary classificatio...

Please sign up or login with your details

Forgot password? Click here to reset