Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

by   Peipei Liu, et al.

Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused on multimodal fusion strategies, and the deep study of modal representation learning was given less attention. Recently, contrastive learning has been confirmed effective at endowing the learned representation with stronger discriminate ability. Inspired by this, we explore the improvement approaches of modality representation with contrastive learning in this study. To this end, we devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives. At the first stage, for the improvement of unimodal representations, we employ the supervised contrastive learning to pull samples within the same class together while the other samples are pushed apart. At the second stage, a self-supervised contrastive learning is designed for the improvement of the distilled unimodal representations after cross-modal interaction. At last, we leverage again the supervised contrastive learning to enhance the fused multimodal representation. After all the contrast trainings, we next achieve the classification task based on frozen representations. We conduct experiments on three open datasets, and results show the advance of our model.


page 1

page 2

page 3

page 4


ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Multimodal Sentiment Analysis leverages multimodal signals to detect the...

Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal Prediction for Multimodal Sentiment Analysis

Multimodal representation learning is a challenging task in which previo...

On self-supervised multi-modal representation learning: An application to Alzheimer's disease

Introspection of deep supervised predictive models trained on functional...

Identifiability Results for Multimodal Contrastive Learning

Contrastive learning is a cornerstone underlying recent progress in mult...

TriCoLo: Trimodal Contrastive Loss for Fine-grained Text to Shape Retrieval

Recent work on contrastive losses for learning joint embeddings over mul...

Understanding Contrastive Learning Through the Lens of Margins

Self-supervised learning, or SSL, holds the key to expanding the usage o...

Taxonomy of multimodal self-supervised representation learning

Sensory input from multiple sources is crucial for robust and coherent h...

Please sign up or login with your details

Forgot password? Click here to reset