Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

02/15/2018
by   Caroline Etienne, et al.
0

In this work, we design a neural network for recognizing emotions in speech, using the standard IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. Applying techniques of data augmentation, layer-wise learning rate adjustment and batch normalization, we obtain highly competitive results, with 64.5 on four emotions. Moreover, we show that the model performance is strongly correlated with the labeling confidence, which highlights a fundamental difficulty in emotion recognition.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset