A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

08/17/2023
by   Li Liu, et al.
0

Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL research are well collected and organized, along with the evaluation of SOTA methods on these datasets. The survey highlights challenges such as limited labeled data, multi-modal learning, and the need for domain adaptation to generalize models to unseen speakers or languages. Future research directions are presented, including exploring self-supervised learning techniques, integrating contextual information from other modalities, and exploiting large-scale pre-trained multi-modal models. In summary, this survey paper provides a comprehensive understanding of deep multi-modal learning for various BL generations and recognitions for the first time. By analyzing advancements, challenges, and future directions, it serves as a valuable resource for researchers and practitioners in advancing this field. n addition, we maintain a continuously updated paper list for deep multi-modal learning for BL recognition and generation: https://github.com/wentaoL86/awesome-body-language.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 14

research
02/20/2023

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

With the urgent demand for generalized deep models, many pre-trained big...
research
10/11/2019

Multi-modal Deep Analysis for Multimedia

With the rapid development of Internet and multimedia services in the pa...
research
02/15/2022

Multi-Modal Data Fusion in Enhancing Human-Machine Interaction for Robotic Applications: A Survey

Human-machine interaction has been around for several decades now, with ...
research
01/27/2020

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Fine-grained action recognition datasets exhibit environmental bias, whe...
research
09/22/2020

Visual Methods for Sign Language Recognition: A Modality-Based Review

Sign language visual recognition from continuous multi-modal streams is ...
research
01/26/2023

Real-Time Digital Twins: Vision and Research Directions for 6G and Beyond

This article presents a vision where real-time digital twins of the phys...
research
04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...

Please sign up or login with your details

Forgot password? Click here to reset