Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation

by   Jiaqi Xu, et al.

Video-based automatic depression analysis provides a fast, objective and repeatable self-assessment solution, which has been widely developed in recent years. While depression clues may be reflected by human facial behaviours of various temporal scales, most existing approaches either focused on modelling depression from short-term or video-level facial behaviours. In this sense, we propose a two-stage framework that models depression severity from multi-scale short-term and video-level facial behaviours. The short-term depressive behaviour modelling stage first deep learns depression-related facial behavioural features from multiple short temporal scales, where a Depression Feature Enhancement (DFE) module is proposed to enhance the depression-related clues for all temporal scales and remove non-depression noises. Then, the video-level depressive behaviour modelling stage proposes two novel graph encoding strategies, i.e., Sequential Graph Representation (SEG) and Spectral Graph Representation (SPG), to re-encode all short-term features of the target video into a video-level graph representation, summarizing depression-related multi-scale video-level temporal information. As a result, the produced graph representations predict depression severity using both short-term and long-term facial beahviour patterns. The experimental results on AVEC 2013 and AVEC 2014 datasets show that the proposed DFE module constantly enhanced the depression severity estimation performance for various CNN models while the SPG is superior than other video-level modelling methods. More importantly, the result achieved for the proposed two-stage framework shows its promising and solid performance compared to widely-used one-stage modelling approaches.


page 11

page 12

page 15


Domain-specific Learning of Multi-scale Facial Dynamics for Apparent Personality Traits Prediction

Human personality decides various aspects of their daily life and workin...

Non-verbal Facial Action Units-based Automatic Depression Classification

Depression is a common mental disorder that causes people to experience ...

Learning Person-specific Network Representation for Apparent Personality Traits Recognition

Recent studies show that apparent personality traits can be reflected fr...

FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning

Learning representations for graph-structured data is essential for grap...

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

Action detection is an essential and challenging task, especially for de...

HGV4Risk: Hierarchical Global View-guided Sequence Representation Learning for Risk Prediction

Risk prediction, as a typical time series modeling problem, is usually a...

Modelling Paralinguistic Properties in Conversational Speech to Detect Bipolar Disorder and Borderline Personality Disorder

Bipolar disorder (BD) and borderline personality disorder (BPD) are two ...

Please sign up or login with your details

Forgot password? Click here to reset