Hate and Offensive Speech Detection in Hindi and Marathi

10/23/2021
by   Abhishek Velankar, et al.
0

Sentiment analysis is the most basic NLP task to determine the polarity of text data. There has been a significant amount of work in the area of multilingual text as well. Still hate and offensive speech detection faces a challenge due to inadequate availability of data, especially for Indian languages like Hindi and Marathi. In this work, we consider hate and offensive speech detection in Hindi and Marathi texts. The problem is formulated as a text classification task using the state of the art deep learning approaches. We explore different deep learning architectures like CNN, LSTM, and variations of BERT like multilingual BERT, IndicBERT, and monolingual RoBERTa. The basic models based on CNN and LSTM are augmented with fast text word embeddings. We use the HASOC 2021 Hindi and Marathi hate speech datasets to compare these algorithms. The Marathi dataset consists of binary labels and the Hindi dataset consists of binary as well as more-fine grained labels. We show that the transformer-based models perform the best and even the basic models along with FastText embeddings give a competitive performance. Moreover, with normal hyper-parameter tuning, the basic models perform better than BERT-based models on the fine-grained Hindi dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi

Transformers are the most eminent architectures used for a vast range of...
research
01/13/2021

Experimental Evaluation of Deep Learning models for Marathi Text Classification

The Marathi language is one of the prominent languages used in India. It...
research
01/11/2021

Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

The social media platform is a convenient medium to express personal tho...
research
01/27/2022

Highly Generalizable Models for Multilingual Hate Speech Detection

Hate speech detection has become an important research topic within the ...
research
12/28/2020

DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

Exponential growths of social media and micro-blogging sites not only pr...
research
05/22/2020

Living Machines: A study of atypical animacy

This paper proposes a new approach to animacy detection, the task of det...
research
09/14/2021

Semantic Answer Type Prediction using BERT: IAI at the ISWC SMART Task 2020

This paper summarizes our participation in the SMART Task of the ISWC 20...

Please sign up or login with your details

Forgot password? Click here to reset