3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

by   Anurenjan Purushothaman, et al.

Automatic speech recognition in multi-channel reverberant conditions is a challenging task. The conventional way of suppressing the reverberation artifacts involves a beamforming based enhancement of the multi-channel speech signal, which is used to extract spectrogram based features for a neural network acoustic model. In this paper, we propose to extract features directly from the multi-channel speech signal using a multi variate autoregressive (MAR) modeling approach, where the correlations among all the three dimensions of time, frequency and channel are exploited. The MAR features are fed to a convolutional neural network (CNN) architecture which performs the joint acoustic modeling on the three dimensions. The 3-D CNN architecture allows the combination of multi-channel features that optimize the speech recognition cost compared to the traditional beamforming models that focus on the enhancement task. Experiments are conducted on the CHiME-3 and REVERB Challenge dataset using multi-channel reverberant speech. In these experiments, the proposed 3-D feature and acoustic modeling approach provides significant improvements over an ASR system trained with beamformed audio (average relative improvements of 10 respectively.


page 1

page 2

page 3

page 4


Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

Automatic speech recognition in reverberant conditions is a challenging ...

CGCNN: Complex Gabor Convolutional Neural Network on raw speech

Convolutional Neural Networks (CNN) have been used in Automatic Speech R...

Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

The task of speech recognition in far-field environments is adversely af...

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

This paper presents, in the context of multi-channel ASR, a method to ad...

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

Acoustic models in real-time speech recognition systems typically stack ...

Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition

Integration of multiple microphone data is one of the key ways to achiev...

Please sign up or login with your details

Forgot password? Click here to reset