Multimodal Learning for Hateful Memes Detection
Memes are multimedia documents containing images and phrases that usually build a humorous meaning when combined. However, hateful memes are also spread hatred within social networks. Automatically detecting the hateful memes would help decrease their harmful societal impact. Unlike the conventional multimodal tasks, where the visual and textual information is semantically aligned, the challenge of hateful memes detection lies in its unique multimodal information. The multimodal information in the memes are weakly aligned or even irrelevant, which makes the model not only needs to understand the content in the memes but also reasoning over the multiple modalities. In this paper, we focus on hateful memes detection for multimodal memes and propose a novel method that incorporates the image captioning process into the memes detection process. We conducted extensive experiments on multimodal meme datasets and illustrated the effectiveness of our approach. Our model also achieves promising results on the Hateful memes detection challenge.
READ FULL TEXT