Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction

07/06/2019
by   Mohammed Senoussaoui, et al.
0

In this paper we present a novel approach for extracting a Bag-of-Words (BoW) representation based on a Neural Network codebook. The conventional BoW model is based on a dictionary (codebook) built from elementary representations which are selected randomly or by using a clustering algorithm on a training dataset. A metric is then used to assign unseen elementary representations to the closest dictionary entries in order to produce a histogram. In the proposed approach, an autoencoder (AE) encompasses the role of both the dictionary creation and the assignment metric. The dimension of the encoded layer of the AE corresponds to the size of the dictionary and the output of its neurons represents the assignment metric. Experimental results for the continuous emotion prediction task on the AVEC 2017 audio dataset have shown an improvement of the Concordance Correlation Coefficient (CCC) from 0.225 to 0.322 for arousal dimension and from 0.244 to 0.368 for valence dimension relative to the conventional BoW version implemented in a baseline system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2017

Continuous Multimodal Emotion Recognition Approach for AVEC 2017

This paper reports the analysis of audio and visual features in predicti...
research
09/03/2016

Stochastic Learning of Multi-Instance Dictionary for Earth Mover's Distance based Histogram Comparison

Dictionary plays an important role in multi-instance data representation...
research
11/17/2015

Learning to retrieve out-of-vocabulary words in speech recognition

Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech rec...
research
06/24/2016

Interactive Semantic Featuring for Text Classification

In text classification, dictionaries can be used to define human-compreh...
research
03/14/2016

Automatic Discrimination of Color Retinal Images using the Bag of Words Approach

Diabetic retinopathy (DR) and age related macular degeneration (ARMD) ar...
research
12/02/2016

Alleviating Overfitting for Polysemous Words for Word Representation Estimation Using Lexicons

Though there are some works on improving distributed word representation...
research
06/22/2022

Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata

The Petit Larousse illustré is a French dictionary first published in 19...

Please sign up or login with your details

Forgot password? Click here to reset