Attention-Based Acoustic Feature Fusion Network for Depression Detection

by   Xiao Xu, et al.

Depression, a common mental disorder, significantly influences individuals and imposes considerable societal impacts. The complexity and heterogeneity of the disorder necessitate prompt and effective detection, which nonetheless, poses a difficult challenge. This situation highlights an urgent requirement for improved detection methods. Exploiting auditory data through advanced machine learning paradigms presents promising research directions. Yet, existing techniques mainly rely on single-dimensional feature models, potentially neglecting the abundance of information hidden in various speech characteristics. To rectify this, we present the novel Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features. We present a novel weight adjustment module for late fusion that boosts performance by efficaciously synthesizing these features. The effectiveness of our approach is confirmed via extensive validation on two clinical speech databases, CNRAC and CS-NRAC, thereby outperforming previous methods in depression detection and subtype classification. Further in-depth analysis confirms the key role of each feature and highlights the importance of MFCCrelated features in speech-based depression detection.


page 6

page 9

page 14

page 15

page 18


Adaptive Feature Fusion: Enhancing Generalization in Deep Learning Models

In recent years, deep learning models have demonstrated remarkable succe...

AudVowelConsNet: A Phoneme-Level Based Deep CNN Architecture for Clinical Depression Diagnosis

Depression is a common and serious mood disorder that negatively affects...

Giving Attention to the Unexpected: Using Prosody Innovations in Disfluency Detection

Disfluencies in spontaneous speech are known to be associated with proso...

Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning

In this paper, we propose a classification based glottal closure instant...

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

We present two multimodal fusion-based deep learning models that consume...

Sports highlights generation based on acoustic events detection: A rugby case study

We approach the challenging problem of generating highlights from sports...

Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs

Autonomous soundscape augmentation systems typically use trained models ...

Please sign up or login with your details

Forgot password? Click here to reset