An Empirical Study and Improvement for Speech Emotion Recognition

04/08/2023
by   Zhen Wu, et al.
0

Multimodal speech emotion recognition aims to detect speakers' emotions from audio and text. Prior works mainly focus on exploiting advanced networks to model and fuse different modality information to facilitate performance, while neglecting the effect of different fusion strategies on emotion recognition. In this work, we consider a simple yet important problem: how to fuse audio and text modality information is more helpful for this multimodal task. Further, we propose a multimodal emotion recognition model improved by perspective loss. Empirical results show our method obtained new state-of-the-art results on the IEMOCAP dataset. The in-depth analysis explains why the improved model can achieve improvements and outperforms baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2022

ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data

With the fast development of artificial intelligence and short videos, e...
research
11/20/2022

Contrastive Regularization for Multimodal Emotion Recognition Using Audio and Text

Speech emotion recognition is a challenge and an important step towards ...
research
09/06/2019

Learning Alignment for Multimodal Emotion Recognition from Speech

Speech emotion recognition is a challenging problem because human convey...
research
09/05/2023

Leveraging Label Information for Multimodal Emotion Recognition

Multimodal emotion recognition (MER) aims to detect the emotional status...
research
04/30/2022

Gaze-enhanced Crossmodal Embeddings for Emotion Recognition

Emotional expressions are inherently multimodal – integrating facial beh...
research
07/23/2022

Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss

Emotion recognition is involved in several real-world applications. With...
research
05/12/2023

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Most current audio-visual emotion recognition models lack the flexibilit...

Please sign up or login with your details

Forgot password? Click here to reset