MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

06/04/2020
by   Miguel Vasco, et al.
16

Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the model's evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.

READ FULL TEXT
research
06/16/2018

Learning Factorized Multimodal Representations

Learning representations of multimodal data is a fundamentally complex r...
research
10/07/2021

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

This work addresses the problem of sensing the world: how to learn a mul...
research
05/29/2018

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

Multimodal sensory data resembles the form of information perceived by h...
research
01/12/2023

Multimodal Deep Learning

This book is the result of a seminar in which we reviewed multimodal app...
research
07/27/2023

Cortex Inspired Learning to Recover Damaged Signal Modality with ReD-SOM Model

Recent progress in the fields of AI and cognitive sciences opens up new ...
research
11/18/2019

Modality To Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Learning joint embedding space for various modalities is of vital import...
research
02/07/2022

GMC – Geometric Multimodal Contrastive Representation Learning

Learning representations of multimodal data that are both informative an...

Please sign up or login with your details

Forgot password? Click here to reset