SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation of Individual Modalities

by   Pengbo Hu, et al.

As deep learning advances, there is an ever-growing demand for models capable of synthesizing information from multi-modal resources to address the complex tasks raised from real-life applications. Recently, many large multi-modal datasets have been collected, on which researchers actively explore different methods of fusing multi-modal information. However, little attention has been paid to quantifying the contribution of different modalities within the proposed models. In this paper, we propose the SHapley vAlue-based PErceptual (SHAPE) scores that measure the marginal contribution of individual modalities and the degree of cooperation across modalities. Using these scores, we systematically evaluate different fusion methods on different multi-modal datasets for different tasks. Our experiments suggest that for some tasks where different modalities are complementary, the multi-modal models still tend to use the dominant modality alone and ignore the cooperation across modalities. On the other hand, models learn to exploit cross-modal cooperation when different modalities are indispensable for the task. In this case, the scores indicate it is better to fuse different modalities at relatively early stages. We hope our scores can help improve the understanding of how the present multi-modal models operate on different modalities and encourage more sophisticated methods of integrating multiple modalities.


page 1

page 2

page 3

page 4


Enhancing Multi-modal Cooperation via Fine-grained Modality Valuation

One primary topic of multi-modal learning is to jointly incorporate hete...

Perceptual Score: What Data Modalities Does Your Model Perceive?

Machine learning advances in the last decade have relied significantly o...

Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem

With the emergence of diverse data collection techniques, objects in rea...

Deep Multi-Modal Sets

Many vision-related tasks benefit from reasoning over multiple modalitie...

Learning Abstract Representations through Lossy Compression of Multi-Modal Signals

A key competence for open-ended learning is the formation of increasingl...

Self-Augmented Multi-Modal Feature Embedding

Oftentimes, patterns can be represented through different modalities. Fo...

Dynamic Deep Multi-modal Fusion for Image Privacy Prediction

With millions of images that are shared online on social networking site...

Please sign up or login with your details

Forgot password? Click here to reset