Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

03/07/2018
by   Wei-Ning Hsu, et al.
0

The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions. In this paper, we address robustness by studying domain invariant features, such that domain information becomes transparent to ASR systems, resolving the mismatch problem. Specifically, we investigate a recent model, called the Factorized Hierarchical Variational Autoencoder (FHVAE). FHVAEs learn to factorize sequence-level and segment-level attributes into different latent variables without supervision. We argue that the set of latent variables that contain segment-level information is our desired domain invariant feature for ASR. Experiments are conducted on Aurora-4 and CHiME-4, which demonstrate 41 error rate reductions respectively on mismatched domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2017

Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

Domain mismatch between training and testing can lead to significant deg...
research
04/12/2019

Unsupervised Speech Domain Adaptation Based on Disentangled Representation Learning for Robust Speech Recognition

In general, the performance of automatic speech recognition (ASR) system...
research
09/22/2017

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

We present a factorized hierarchical variational autoencoder, which lear...
research
09/24/2022

Unsupervised domain adaptation for speech recognition with unsupervised error correction

The transcription quality of automatic speech recognition (ASR) systems ...
research
04/15/2021

Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

End-to-end automatic speech recognition (ASR) can achieve promising perf...
research
09/20/2023

Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition

Crafting an effective Automatic Speech Recognition (ASR) solution for di...
research
05/05/2022

Unsupervised Mismatch Localization in Cross-Modal Sequential Data

Content mismatch usually occurs when data from one modality is translate...

Please sign up or login with your details

Forgot password? Click here to reset