CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations

08/21/2022
by   Junghun Kim, et al.
0

Multimodal sentiment analysis has become an increasingly popular research area as the demand for multimodal online content is growing. For multimodal sentiment analysis, words can have different meanings depending on the linguistic context and non-verbal information, so it is crucial to understand the meaning of the words accordingly. In addition, the word meanings should be interpreted within the whole utterance context that includes nonverbal information. In this paper, we present a Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations (CMSBERT-CLR), which incorporates the whole context's non-verbal and verbal information and aligns modalities more effectively through contrastive learning. First, we introduce a Context-driven Modality Shifting (CMS) to incorporate the non-verbal and verbal information within the whole context of the sentence utterance. Then, for improving the alignment of different modalities within a common embedding space, we apply contrastive learning. Furthermore, we use an exponential moving average parameter and label smoothing as optimization strategies, which can make the convergence of the network more stable and increase the flexibility of the alignment. In our experiments, we demonstrate that our approach achieves state-of-the-art results.

READ FULL TEXT
research
06/27/2023

ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Multimodal Sentiment Analysis leverages multimodal signals to detect the...
research
07/11/2018

Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis

Multimodal machine learning is a core research area spanning the languag...
research
08/31/2021

Improving Multimodal fusion via Mutual Dependency Maximisation

Multimodal sentiment analysis is a trending area of research, and the mu...
research
08/22/2022

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Multimodal sentiment analysis (MSA), which supposes to improve text-base...
research
12/19/2018

Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities

Multimodal sentiment analysis is a core research area that studies speak...
research
09/10/2023

Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition

Various types of sensors have been considered to develop human action re...
research
11/23/2018

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Humans convey their intentions through the usage of both verbal and nonv...

Please sign up or login with your details

Forgot password? Click here to reset