EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition

03/10/2021
by   Maurice Gerczuk, et al.
0

In this manuscript, the topic of multi-corpus Speech Emotion Recognition (SER) is approached from a deep transfer learning perspective. A large corpus of emotional speech data, EmoSet, is assembled from a number of existing SER corpora. In total, EmoSet contains 84181 audio recordings from 26 SER corpora with a total duration of over 65 hours. The corpus is then utilised to create a novel framework for multi-corpus speech emotion recognition, namely EmoNet. A combination of a deep ResNet architecture and residual adapters is transferred from the field of multi-domain visual recognition to multi-corpus SER on EmoSet. Compared against two suitable baselines and more traditional training and transfer settings for the ResNet, the residual adapter approach enables parameter efficient training of a multi-domain SER model on all 26 corpora. A shared model with only 3.5 times the number of parameters of a model trained on a single database leads to increased performance for 21 of the 26 corpora in EmoSet. Measured by McNemar's test, these improvements are further significant for ten datasets at p<0.05 while there are just two corpora that see only significant decreases across the residual adapter transfer experiments. Finally, we make our EmoNet framework publicly available for users and developers at https://github.com/EIHW/EmoNet. EmoNet provides an extensive command line interface which is comprehensively documented and can be used in a variety of multi-corpus transfer learning settings.

READ FULL TEXT
research
01/19/2018

Cross Corpus Speech Emotion Classificaiton - An Effective Transfer Learning Technique

Cross-corpus speech emotion recognition can be a useful transfer learnin...
research
05/31/2022

APPReddit: a Corpus of Reddit Posts Annotated for Appraisal

Despite the large number of computational resources for emotion recognit...
research
09/09/2021

DeepEMO: Deep Learning for Speech Emotion Recognition

We proposed the industry level deep learning approach for speech emotion...
research
04/23/2021

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

Deep neural speech and audio processing systems have a large number of t...
research
02/17/2023

Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

In this paper, we propose a novel deep transfer learning method called d...
research
08/04/2023

Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

Cross-corpus speech emotion recognition (SER) seeks to generalize the ab...
research
01/10/2022

A study on cross-corpus speech emotion recognition and data augmentation

Models that can handle a wide range of speakers and acoustic conditions ...

Please sign up or login with your details

Forgot password? Click here to reset