Synthetic Misinformers: Generating and Combating Multimodal Misinformation

With the expansion of social media and the increasing dissemination of multimedia content, the spread of misinformation has become a major concern. This necessitates effective strategies for multimodal misinformation detection (MMD) that detect whether the combination of an image and its accompanying text could mislead or misinform. Due to the data-intensive nature of deep neural networks and the labor-intensive process of manual annotation, researchers have been exploring various methods for automatically generating synthetic multimodal misinformation - which we refer to as Synthetic Misinformers - in order to train MMD models. However, limited evaluation on real-world misinformation and a lack of comparisons with other Synthetic Misinformers makes difficult to assess progress in the field. To address this, we perform a comparative study on existing and new Synthetic Misinformers that involves (1) out-of-context (OOC) image-caption pairs, (2) cross-modal named entity inconsistency (NEI) as well as (3) hybrid approaches and we evaluate them against real-world misinformation; using the COSMOS benchmark. The comparative study showed that our proposed CLIP-based Named Entity Swapping can lead to MMD models that surpass other OOC and NEI Misinformers in terms of multimodal accuracy and that hybrid approaches can lead to even higher detection accuracy. Nevertheless, after alleviating information leakage from the COSMOS evaluation protocol, low Sensitivity scores indicate that the task is significantly more challenging than previous studies suggested. Finally, our findings showed that NEI-based Synthetic Misinformers tend to suffer from a unimodal bias, where text-only MMDs can outperform multimodal ones.


page 2

page 3

page 4


VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias

Multimedia content has become ubiquitous on social media platforms, lead...

Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

The World Wide Web has become a popular source for gathering information...

Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition

With massive explosion of social media such as Twitter and Instagram, pe...

Multi-Granularity Cross-Modality Representation Learning for Named Entity Recognition on Social Media

Named Entity Recognition (NER) on social media refers to discovering and...

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

The challenge posed by multimodal named entity recognition (MNER) is mai...

ARC-NLP at Multimodal Hate Speech Event Detection 2023: Multimodal Methods Boosted by Ensemble Learning, Syntactical and Entity Features

Text-embedded images can serve as a means of spreading hate speech, prop...

Deep Multimodal Image-Repurposing Detection

Nefarious actors on social media and other platforms often spread rumors...

Please sign up or login with your details

Forgot password? Click here to reset