Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

11/10/2019
by   Chao Zhang, et al.
9

Deep learning has revolutionized speech recognition, image recognition, and natural language processing since 2010, each involving a single modality in the input signal. However, many applications in artificial intelligence involve more than one modality. It is therefore of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities. In this paper, a technical review of the models and learning methods for multimodal intelligence is provided. The main focus is the combination of vision and natural language, which has become an important area in both computer vision and natural language processing research communities. This review provides a comprehensive analysis of recent work on multimodal deep learning from three new angles - learning multimodal representations, the fusion of multimodal signals at various levels, and multimodal applications. On multimodal representation learning, we review the key concept of embedding, which unifies the multimodal signals into the same vector space and thus enables cross-modality signal processing. We also review the properties of the many types of embedding constructed and learned for general downstream tasks. On multimodal fusion, this review focuses on special architectures for the integration of the representation of unimodal signals for a particular task. On applications, selected areas of a broad interest in current literature are covered, including caption generation, text-to-image generation, and visual question answering. We believe this review can facilitate future studies in the emerging field of multimodal intelligence for the community.

READ FULL TEXT
research
05/17/2021

A Review on Explainability in Multimodal Deep Neural Nets

Artificial Intelligence techniques powered by deep neural nets have achi...
research
07/14/2023

A scoping review on multimodal deep learning in biomedical images and texts

Computer-assisted diagnostic and prognostic systems of the future should...
research
10/05/2022

Vision+X: A Survey on Multimodal Learning in the Light of Data

We are perceiving and communicating with the world in a multisensory man...
research
04/30/2022

Multimodal Representation Learning With Text and Images

In recent years, multimodal AI has seen an upward trend as researchers a...
research
06/06/2022

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Recently, Self-Supervised Representation Learning (SSRL) has attracted m...
research
02/18/2022

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has becom...
research
10/16/2020

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

The focus of this survey is on the analysis of two modalities of multimo...

Please sign up or login with your details

Forgot password? Click here to reset