Curriculum Learning Meets Weakly Supervised Modality Correlation Learning

by   Sijie Mai, et al.

In the field of multimodal sentiment analysis (MSA), a few studies have leveraged the inherent modality correlation information stored in samples for self-supervised learning. However, they feed the training pairs in a random order without consideration of difficulty. Without human annotation, the generated training pairs of self-supervised learning often contain noise. If noisy or hard pairs are used for training at the easy stage, the model might be stuck in bad local optimum. In this paper, we inject curriculum learning into weakly supervised modality correlation learning. The weakly supervised correlation learning leverages the label information to generate scores for negative pairs to learn a more discriminative embedding space, where negative pairs are defined as two unimodal embeddings from different samples. To assist the correlation learning, we feed the training pairs to the model according to difficulty by the proposed curriculum learning, which consists of elaborately designed scoring and feeding functions. The scoring function computes the difficulty of pairs using pre-trained and current correlation predictors, where the pairs with large losses are defined as hard pairs. Notably, the hardest pairs are discarded in our algorithm, which are assumed as noisy pairs. Moreover, the feeding function takes the difference of correlation losses as feedback to determine the feeding actions (`stay', `step back', or `step forward'). The proposed method reaches state-of-the-art performance on MSA.


page 1

page 2

page 3

page 4


Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Representation Learning is a significant and challenging task in multimo...

Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology

One of the biggest challenges for applying machine learning to histopath...

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Speech fluency/disfluency can be evaluated by analyzing a range of phone...

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

Weakly-supervised audio-visual violence detection aims to distinguish sn...

C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

Self-supervised learning (SSL) has drawn an increased attention in the f...

CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

We present a simple yet efficient approach capable of training deep neur...

Angular Gap: Reducing the Uncertainty of Image Difficulty through Model Calibration

Curriculum learning needs example difficulty to proceed from easy to har...

Please sign up or login with your details

Forgot password? Click here to reset