A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

by   Jiangnan Li, et al.

Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers' interactions by building a relation between every two speakers. However, this fine-grained but complicated modeling is computationally expensive, hard to extend, and can only consider local context. To address this problem, we simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies, without identifying every unique speaker for the targeted speaker. To better achieve the simplified interaction modeling of speakers in Transformer, which shows excellent ability to settle long-distance dependency, we design three types of masks and respectively utilize them in three independent Transformer blocks. The designed masks respectively model the conventional context modeling, Intra-Speaker dependency, and Inter-Speaker dependency. Furthermore, different speaker-aware information extracted by Transformer blocks diversely contributes to the prediction, and therefore we utilize the attention mechanism to automatically weight them. Experiments on two ERC datasets indicate that our model is efficacious to achieve better performance.


Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

The emotion recognition in conversation (ERC) task aims to predict the e...

EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa

We present EmoBERTa: Speaker-Aware Emotion Recognition in Conversation w...

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

In this work, we explore the dependencies between speaker recognition an...

MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation

Emotion recognition in conversation (ERC) is a crucial component in affe...

S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation

Emotion recognition in conversation (ERC) has attracted much attention i...

EmotionIC: Emotional Inertia and Contagion-driven Dependency Modelling for Emotion Recognition in Conversation

Emotion Recognition in Conversation (ERC) has attracted growing attentio...

T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model

Identifying multiple speakers without knowing where a speaker's voice is...

Please sign up or login with your details

Forgot password? Click here to reset