"I have vxxx bxx connexxxn!": Facing Packet Loss in Deep Speech Emotion Recognition

05/15/2020
by   Mostafa M. Mohamed, et al.
0

In applications that use emotion recognition via speech, frame-loss can be a severe issue given manifold applications, where the audio stream loses some data frames, for a variety of reasons like low bandwidth. In this contribution, we investigate for the first time the effects of frame-loss on the performance of emotion recognition via speech. Reproducible extensive experiments are reported on the popular RECOLA corpus using a state-of-the-art end-to-end deep neural network, which mainly consists of convolution blocks and recurrent layers. A simple environment based on a Markov Chain model is used to model the loss mechanism based on two main parameters. We explore matched, mismatched, and multi-condition training settings. As one expects, the matched setting yields the best performance, while the mismatched yields the lowest. Furthermore, frame-loss as a data augmentation technique is introduced as a general-purpose strategy to overcome the effects of frame-loss. It can be used during training, and we observed it to produce models that are more robust against frame-loss in run-time environments.

READ FULL TEXT
research
05/15/2020

ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition

Packet loss is a common problem in data transmission, including speech d...
research
10/19/2020

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach fo...
research
01/10/2022

A study on cross-corpus speech emotion recognition and data augmentation

Models that can handle a wide range of speakers and acoustic conditions ...
research
01/04/2018

A pairwise discriminative task for speech emotion recognition

Speech emotion recognition is an important task in human-machine interac...
research
08/05/2020

Compact Graph Architecture for Speech Emotion Recognition

We propose a deep graph approach to address the task of speech emotion r...
research
05/18/2020

Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

Speech emotion recognition systems (SER) can achieve high accuracy when ...
research
08/23/2017

Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition

The goal of continuous emotion recognition is to assign an emotion value...

Please sign up or login with your details

Forgot password? Click here to reset