MTGAT: Multimodal Temporal Graph Attention Networks for Unaligned Human Multimodal Language Sequences

10/22/2020
by   Jianing Yang, et al.
0

Human communication is multimodal in nature; it is through multiple modalities, i.e., language, voice, and facial expressions, that opinions and emotions are expressed. Data in this domain exhibits complex multi-relational and temporal interactions. Learning from this data is a fundamentally challenging research problem. In this paper, we propose Multimodal Temporal Graph Attention Networks (MTGAT). MTGAT is an interpretable graph-based neural model that provides a suitable framework for analyzing this type of multimodal sequential data. We first introduce a procedure to convert unaligned multimodal sequence data into a graph with heterogeneous nodes and edges that captures the rich interactions between different modalities through time. Then, a novel graph operation, called Multimodal Temporal Graph Attention, along with a dynamic pruning and read-out technique is designed to efficiently process this multimodal temporal graph. By learning to focus only on the important interactions within the graph, our MTGAT is able to achieve state-of-the-art performance on multimodal sentiment analysis and emotion recognition benchmarks including IEMOCAP and CMU-MOSI, while utilizing significantly fewer computations.

READ FULL TEXT
08/12/2018

Multimodal Language Analysis with Recurrent Multistage Fusion

Computational modeling of human multimodal language is an emerging resea...
08/17/2021

Graph Capsule Aggregation for Unaligned Multimodal Sequences

Humans express their opinions and emotions through multiple modalities w...
02/03/2018

Multi-attention Recurrent Network for Human Communication Comprehension

Human face-to-face communication is a complex multimodal signal. We use ...
11/27/2020

Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph Pooling Fusion

In this paper, we study the task of multimodal sequence analysis which a...
10/06/2021

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Multimodal Language Analysis is a demanding area of research, since it i...
05/22/2018

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

Multimodal affective computing, learning to recognize and interpret huma...
11/22/2019

Factorized Multimodal Transformer for Multimodal Sequential Learning

The complex world around us is inherently multimodal and sequential (con...

Please sign up or login with your details

Forgot password? Click here to reset