Depa: Self-supervised audio embedding for depression detection

10/29/2019
by   Heinrich Dinkel, et al.
0

Depression detection research has increased over the last few decades as this disease is becoming a socially-centered problem. One major bottleneck for developing automatic depression detection methods lies on the limited data availability. Recently, pretrained text-embeddings have seen success in sparse data scenarios, while pretrained audio embeddings are rarely investigated. This paper proposes DEPA, a self-supervised, Word2Vec like pretrained depression audio embedding method for depression detection. An encoder-decoder network is used to extract DEPA on sparse-data in-domain (DAIC) and large-data out-domain (switchboard, Alzheimer's) datasets. With DEPA as the audio embedding, performance significantly outperforms traditional audio features regarding both classification and regression metrics. Moreover, we show that large-data out-domain pretraining is beneficial to depression detection performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2022

ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

Vision transformers, which were originally developed for natural languag...
research
06/24/2022

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

Methods for extracting audio and speech features have been studied since...
research
09/08/2023

End-to-End Speech Recognition and Disfluency Removal with Acoustic Language Model Pretraining

The SOTA in transcription of disfluent and conversational speech has in ...
research
05/13/2022

ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation

We present ViT5, a pretrained Transformer-based encoder-decoder model fo...
research
05/11/2023

Extending Audio Masked Autoencoders Toward Audio Restoration

Audio classification and restoration are among major downstream tasks in...
research
07/06/2023

Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays

This work presents the first applications of self-supervised learning ap...
research
11/02/2022

MAST: Multiscale Audio Spectrogram Transformers

We present Multiscale Audio Spectrogram Transformer (MAST) for audio cla...

Please sign up or login with your details

Forgot password? Click here to reset