Detecting Hate Speech in Multi-modal Memes

by   Abhishek Das, et al.

In the past few years, there has been a surge of interest in multi-modal problems, from image captioning to visual question answering and beyond. In this paper, we focus on hate speech detection in multi-modal memes wherein memes pose an interesting multi-modal fusion problem. We aim to solve the Facebook Meme Challenge <cit.> which aims to solve a binary classification problem of predicting whether a meme is hateful or not. A crucial characteristic of the challenge is that it includes "benign confounders" to counter the possibility of models exploiting unimodal priors. The challenge states that the state-of-the-art models perform poorly compared to humans. During the analysis of the dataset, we realized that majority of the data points which are originally hateful are turned into benign just be describing the image of the meme. Also, majority of the multi-modal baselines give more preference to the hate speech (language modality). To tackle these problems, we explore the visual modality using object detection and image captioning models to fetch the "actual caption" and then combine it with the multi-modal representation to perform binary classification. This approach tackles the benign text confounders present in the dataset to improve the performance. Another approach we experiment with is to improve the prediction with sentiment analysis. Instead of only using multi-modal representations obtained from pre-trained neural networks, we also include the unimodal sentiment to enrich the features. We perform a detailed analysis of the above two approaches, providing compelling reasons in favor of the methodologies used.


page 1

page 2

page 4

page 5

page 6

page 7

page 8


Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Multi-modal sentiment analysis plays an important role for providing bet...

Cross-stitched Multi-modal Encoders

In this paper, we propose a novel architecture for multi-modal speech an...

Contextualized Keyword Representations for Multi-modal Retinal Image Captioning

Medical image captioning automatically generates a medical description t...

Multi-Modal Recurrent Fusion for Indoor Localization

This paper considers indoor localization using multi-modal wireless sign...

Multi-modal reward for visual relationships-based image captioning

Deep neural networks have achieved promising results in automatic image ...

Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition

Collecting and accessing a large amount of medical data is very time-con...

Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Significant progress has been made on visual captioning, largely relying...

Please sign up or login with your details

Forgot password? Click here to reset