What Makes Training Multi-Modal Networks Hard?

05/29/2019
by   Weiyao Wang, et al.
0

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our experiments, however, we observe the opposite: the best single-modal network always outperforms the multi-modal network. This observation is consistent across different combinations of modalities and on different tasks and benchmarks. This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity. Second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal. We address these two problems with a technique we call Gradient Blending, which computes an optimal blend of modalities based on their overfitting behavior. We demonstrate that Gradient Blending outperforms widely-used baselines for avoiding overfitting and achieves state-of-the-art accuracy on various tasks including fine-grained sport classification, human action recognition, and acoustic event detection.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 8

page 9

page 13

page 14

research
09/12/2023

Enhancing Multi-modal Cooperation via Fine-grained Modality Valuation

One primary topic of multi-modal learning is to jointly incorporate hete...
research
05/11/2021

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Multi-modal learning, which focuses on utilizing various modalities to i...
research
07/23/2023

Multi-Modal Machine Learning for Assessing Gaming Skills in Online Streaming: A Case Study with CS:GO

Online streaming is an emerging market that address much attention. Asse...
research
03/09/2018

Sequential Outlier Detection based on Incremental Decision Trees

We introduce an online outlier detection algorithm to detect outliers in...
research
07/23/2021

Multi-Modal Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU

The combined use of multiple modalities enables accurate pedestrian dete...
research
03/23/2022

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Despite the remarkable success of deep multi-modal learning in practice,...
research
05/31/2023

A Multi-Modal Transformer Network for Action Detection

This paper proposes a novel multi-modal transformer network for detectin...

Please sign up or login with your details

Forgot password? Click here to reset