Unimodal Face Classification with Multimodal Training

12/08/2021
by   Wenbin Teng, et al.
0

Face recognition is a crucial task in various multimedia applications such as security check, credential access and motion sensing games. However, the task is challenging when an input face is noisy (e.g. poor-condition RGB image) or lacks certain information (e.g. 3D face without color). In this work, we propose a Multimodal Training Unimodal Test (MTUT) framework for robust face classification, which exploits the cross-modality relationship during training and applies it as a complementary of the imperfect single modality input during testing. Technically, during training, the framework (1) builds both intra-modality and cross-modality autoencoders with the aid of facial attributes to learn latent embeddings as multimodal descriptors, (2) proposes a novel multimodal embedding divergence loss to align the heterogeneous features from different modalities, which also adaptively avoids the useless modality (if any) from confusing the model. This way, the learned autoencoders can generate robust embeddings in single-modality face classification on test stage. We evaluate our framework in two face classification datasets and two kinds of testing input: (1) poor-condition image and (2) point cloud or 3D face mesh, when both 2D and 3D modalities are available for training. We experimentally show that our MTUT framework consistently outperforms ten baselines on 2D and 3D settings of both datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Learning Unseen Modality Interaction

Multimodal learning assumes all modality combinations of interest are av...
research
11/29/2021

Heterogeneous Visible-Thermal and Visible-Infrared Face Recognition using Unit-Class Loss and Cross-Modality Discriminator

Visible-to-thermal face image matching is a challenging variate of cross...
research
08/08/2017

Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition

Heterogeneous face recognition (HFR) aims to match facial images acquire...
research
07/26/2023

Visual Prompt Flexible-Modal Face Anti-Spoofing

Recently, vision transformer based multimodal learning methods have been...
research
10/17/2022

MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion

Multimodal knowledge graph completion (MKGC) aims to predict missing ent...
research
06/15/2023

Training Multimedia Event Extraction With Generated Images and Captions

Contemporary news reporting increasingly features multimedia content, mo...
research
02/11/2023

Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing

Recently, vision transformer (ViT) based multimodal learning methods hav...

Please sign up or login with your details

Forgot password? Click here to reset