Visualizing Automatic Speech Recognition – Means for a Better Understanding?

02/01/2022
by   Karla Markert, et al.
5

Automatic speech recognition (ASR) is improving ever more at mimicking human speech processing. The functioning of ASR, however, remains to a large extent obfuscated by the complex structure of the deep neural networks (DNNs) they are based on. In this paper, we show how so-called attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarify the working of ASR. Taking DeepSpeech, an end-to-end model for ASR, as a case study, we show how these techniques help to visualize which features of the input are the most influential in determining the output. We focus on three visualization techniques: Layer-wise Relevance Propagation (LRP), Saliency Maps, and Shapley Additive Explanations (SHAP). We compare these methods and discuss potential further applications, such as in the detection of adversarial examples.

READ FULL TEXT

page 2

page 6

page 7

research
07/21/2020

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

Recent advances in Automatic Speech Recognition (ASR) demonstrated how e...
research
02/19/2020

Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models

Deep Learning based Automatic Speech Recognition (ASR) models are very s...
research
02/04/2022

Polyphonic pitch detection with convolutional recurrent neural networks

Recent directions in automatic speech recognition (ASR) research have sh...
research
02/27/2023

Explanations for Automatic Speech Recognition

We address quality assessment for neural network based ASR by providing ...
research
03/09/2020

Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data

Most state-of-the-art speech systems are using Deep Neural Networks (DNN...
research
11/29/2022

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recogn...
research
10/27/2022

SAN: a robust end-to-end ASR model architecture

In this paper, we propose a novel Siamese Adversarial Network (SAN) arch...

Please sign up or login with your details

Forgot password? Click here to reset