Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

by   Anurenjan Purushothaman, et al.

The task of speech recognition in far-field environments is adversely affected by the reverberant artifacts that elicit as the temporal smearing of the sub-band envelopes. In this paper, we develop a neural model for speech dereverberation using the long-term sub-band envelopes of speech. The sub-band envelopes are derived using frequency domain linear prediction (FDLP) which performs an autoregressive estimation of the Hilbert envelopes. The neural dereverberation model estimates the envelope gain which when applied to reverberant signals suppresses the late reflection components in the far-field signal. The dereverberated envelopes are used for feature extraction in speech recognition. Further, the sequence of steps involved in envelope dereverberation, feature extraction and acoustic modeling for ASR can be implemented as a single neural processing pipeline which allows the joint learning of the dereverberation network and the acoustic model. Several experiments are performed on the REVERB challenge dataset, CHiME-3 dataset and VOiCES dataset. In these experiments, the joint learning of envelope dereverberation and acoustic model yields significant performance improvements over the baseline ASR system based on log-mel spectrogram as well as other past approaches for dereverberation (average relative improvements of 10-24 the baseline system). A detailed analysis on the choice of hyper-parameters and the cost function involved in envelope dereverberation is also provided.


Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

Automatic speech recognition in reverberant conditions is a challenging ...

3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

Automatic speech recognition in multi-channel reverberant conditions is ...

Surprisal-Triggered Conditional Computation with Neural Networks

Autoregressive neural network models have been used successfully for seq...

Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

Speech recognition in noisy and channel distorted scenarios is often cha...

Do WaveNets Dream of Acoustic Waves?

Various sources have reported the WaveNet deep learning architecture bei...

Role of non-linear data processing on speech recognition task in the framework of reservoir computing

The reservoir computing neural network architecture is widely used to te...

Trainable Frontend For Robust and Far-Field Keyword Spotting

Robust and far-field speech recognition is critical to enable true hands...

Please sign up or login with your details

Forgot password? Click here to reset