FiLMing Multimodal Sarcasm Detection with Attention

by   Sundesh Gupta, et al.

Sarcasm detection identifies natural language expressions whose intended meaning is different from what is implied by its surface meaning. It finds applications in many NLP tasks such as opinion mining, sentiment analysis, etc. Today, social media has given rise to an abundant amount of multimodal data where users express their opinions through text and images. Our paper aims to leverage multimodal data to improve the performance of the existing systems for sarcasm detection. So far, various approaches have been proposed that uses text and image modality and a fusion of both. We propose a novel architecture that uses the RoBERTa model with a co-attention layer on top to incorporate context incongruity between input text and image attributes. Further, we integrate feature-wise affine transformation by conditioning the input image through FiLMed ResNet blocks with the textual features using the GRU network to capture the multimodal information. The output from both the models and the CLS token from RoBERTa is concatenated and used for the final prediction. Our results demonstrate that our proposed model outperforms the existing state-of-the-art method by 6.14 dataset.


An AutoML-based Approach to Multimodal Image Sentiment Analysis

Sentiment analysis is a research topic focused on analysing data to extr...

Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture

Because multimodal data contains more modal information, multimodal sent...

A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection

Social media platforms like twitter and facebook have be- come two of th...

CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection

Compared with unimodal data, multimodal data can provide more features t...

Do Images really do the Talking? Analysing the significance of Images in Tamil Troll meme classification

A meme is an part of media created to share an opinion or emotion across...

Multi-channel Attentive Graph Convolutional Network With Sentiment Fusion For Multimodal Sentiment Analysis

Nowadays, with the explosive growth of multimodal reviews on social medi...

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

Computing author intent from multimodal data like Instagram posts requir...

Please sign up or login with your details

Forgot password? Click here to reset