Multimodal Fusion Interactions: A Study of Human and Automatic Quantification

by   Paul Pu Liang, et al.

Multimodal fusion of multiple heterogeneous and interconnected signals is a fundamental challenge in almost all multimodal problems and applications. In order to perform multimodal fusion, we need to understand the types of interactions that modalities can exhibit: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how human annotators can be leveraged to annotate two categorizations of multimodal interactions: (1) partial labels, where different randomly assigned annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator is tasked to annotate the label given the first modality before giving them the second modality and asking them to explicitly reason about how their answer changes, before proposing an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions on the task, uniqueness: the extent to which one modality enables a task prediction that the other does not, and synergy: the extent to which only both modalities enable one to make a prediction about the task that one would not otherwise make using either modality individually. Through extensive experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying interactions in multimodal datasets.


page 1

page 2

page 3

page 4


Deep Equilibrium Multimodal Fusion

Multimodal fusion integrates the complementary information present in mu...

Quantifying Modeling Feature Interactions: An Information Decomposition Framework

The recent explosion of interest in multimodal applications has resulted...

MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis

Multimodal Sentiment Analysis is an active area of research that leverag...

InterMulti:Multi-view Multimodal Interactions with Text-dominated Hierarchical High-order Fusion for Emotion Analysis

Humans are sophisticated at reading interlocutors' emotions from multimo...

Depression Diagnosis and Analysis via Multimodal Multi-order Factor Fusion

Depression is a leading cause of death worldwide, and the diagnosis of d...

Defending Multimodal Fusion Models against Single-Source Adversaries

Beyond achieving high performance across many vision tasks, multimodal m...

ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization

Modality selection is an important step when designing multimodal system...

Please sign up or login with your details

Forgot password? Click here to reset