Multimodal Video-based Apparent Personality Recognition Using Long Short-Term Memory and Convolutional Neural Networks

11/01/2019
by   Süleyman Aslan, et al.
16

Personality computing and affective computing, where the recognition of personality traits is essential, have gained increasing interest and attention in many research areas recently. We propose a novel approach to recognize the Big Five personality traits of people from videos. Personality and emotion affect the speaking style, facial expressions, body movements, and linguistic factors in social contexts, and they are affected by environmental elements. We develop a multimodal system to recognize apparent personality based on various modalities such as the face, environment, audio, and transcription features. We use modality-specific neural networks that learn to recognize the traits independently and we obtain a final prediction of apparent personality with a feature-level fusion of these networks. We employ pre-trained deep convolutional neural networks such as ResNet and VGGish networks to extract high-level features and Long Short-Term Memory networks to integrate temporal information. We train the large model consisting of modality-specific subnetworks using a two-stage training process. We first train the subnetworks separately and then fine-tune the overall model using these trained networks. We evaluate the proposed method using ChaLearn First Impressions V2 challenge dataset. Our approach obtains the best overall "mean accuracy" score, averaged over five personality traits, compared to the state-of-the-art.

READ FULL TEXT

page 1

page 5

research
09/19/2017

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

In this paper, we present a novel deep learning based approach for addre...
research
08/02/2018

RGB Video Based Tennis Action Recognition Using a Deep Weighted Long Short-Term Memory

Action recognition has attracted increasing attention from RGB input in ...
research
09/25/2019

Single-modal and Multi-modal False Arrhythmia Alarm Reduction using Attention-based Convolutional and Recurrent Neural Networks

This study proposes a deep learning model that effectively suppresses th...
research
12/15/2021

Head Matters: Explainable Human-centered Trait Prediction from Head Motion Dynamics

We demonstrate the utility of elementary head-motion units termed kineme...
research
10/31/2016

Bi-modal First Impressions Recognition using Temporally Ordered Deep Audio and Stochastic Visual Features

We propose a novel approach for First Impressions Recognition in terms o...
research
02/20/2023

Explainable Human-centered Traits from Head Motion and Facial Expression Dynamics

We explore the efficacy of multimodal behavioral cues for explainable pr...
research
04/23/2021

The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System

Memories are the tethering threads that tie us to the world, and memorab...

Please sign up or login with your details

Forgot password? Click here to reset