VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias

Multimedia content has become ubiquitous on social media platforms, leading to the rise of multimodal misinformation (MM) and the urgent need for effective strategies to detect and prevent its spread. In recent years, the challenge of multimodal misinformation detection (MMD) has garnered significant attention by researchers and has mainly involved the creation of annotated, weakly annotated, or synthetically generated training datasets, along with the development of various deep learning MMD models. However, the problem of unimodal bias in MMD benchmarks – where biased or unimodal methods outperform their multimodal counterparts on an inherently multimodal task – has been overlooked. In this study, we systematically investigate and identify the presence of unimodal bias in widely-used MMD benchmarks (VMU-Twitter, COSMOS), raising concerns about their suitability for reliable evaluation. To address this issue, we introduce the "VERification of Image-TExtpairs" (VERITE) benchmark for MMD which incorporates real-world data, excludes "asymmetric multimodal misinformation" and utilizes "modality balancing". We conduct an extensive comparative study with a Transformer-based architecture that shows the ability of VERITE to effectively address unimodal bias, rendering it a robust evaluation framework for MMD. Furthermore, we introduce a new method – termed Crossmodal HArd Synthetic MisAlignment (CHASMA) – for generating realistic synthetic training data that preserve crossmodal relations between legitimate images and false human-written captions. By leveraging CHASMA in the training process, we observe consistent and notable improvements in predictive performance on VERITE; with a 9.2 at:


page 2

page 7


Synthetic Misinformers: Generating and Combating Multimodal Misinformation

With the expansion of social media and the increasing dissemination of m...

Training Multimedia Event Extraction With Generated Images and Captions

Contemporary news reporting increasingly features multimedia content, mo...

Analysis of Social Media Data using Multimodal Deep Learning for Disaster Response

Multimedia content in social media platforms provides significant inform...

QuTI! Quantifying Text-Image Consistency in Multimodal Documents

The World Wide Web and social media platforms have become popular source...

Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach

Social media such as Instagram and Twitter have become important platfor...

Analyzing Modality Robustness in Multimodal Sentiment Analysis

Building robust multimodal models are crucial for achieving reliable dep...

Tackling Multipath and Biased Training Data for IMU-Assisted BLE Proximity Detection

Proximity detection is to determine whether an IoT receiver is within a ...

Please sign up or login with your details

Forgot password? Click here to reset