Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

by   Purvi Agrawal, et al.

The learning of interpretable representations from raw data presents significant challenges for time series data like speech. In this work, we propose a relevance weighting scheme that allows the interpretation of the speech representations during the forward propagation of the model itself. The relevance weighting is achieved using a sub-network approach that performs the task of feature selection. A relevance sub-network, applied on the output of first layer of a convolutional neural network model operating on raw speech signals, acts as an acoustic filterbank (FB) layer with relevance weighting. A similar relevance sub-network applied on the second convolutional layer performs modulation filterbank learning with relevance weighting. The full acoustic model consisting of relevance sub-networks, convolutional layers and feed-forward layers is trained for a speech recognition task on noisy and reverberant speech in the Aurora-4, CHiME-3 and VOiCES datasets. The proposed representation learning framework is also applied for the task of sound classification in the UrbanSound8K dataset. A detailed analysis of the relevance weights learned by the model reveals that the relevance weights capture information regarding the underlying speech/audio content. In addition, speech recognition and sound classification experiments reveal that the incorporation of relevance weighting in the neural network architecture improves the performance significantly.


page 1

page 5

page 6

page 12


Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

Speech recognition in noisy and channel distorted scenarios is often cha...

A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

In this work, we propose a multi-head relevance weighting framework to l...

Interpretable Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection

In this paper, we describe an approach for representation learning of au...

Interpreting intermediate convolutional layers of CNNs trained on raw speech

This paper presents a technique to interpret and visualize intermediate ...

Interpreting deep urban sound classification using Layer-wise Relevance Propagation

After constructing a deep neural network for urban sound classification,...

Explainable AI for Time Series via Virtual Inspection Layers

The field of eXplainable Artificial Intelligence (XAI) has greatly advan...

Double Relief with progressive weighting function

Feature weighting algorithms try to solve a problem of great importance ...

Please sign up or login with your details

Forgot password? Click here to reset