The Marchex 2018 English Conversational Telephone Speech Recognition System
In this paper, we describe recent improvements to the production Marchex speech recognition system for our spontaneous customer-to-business telephone conversations. We outline our semi-supervised lattice-free maximum mutual information (LF-MMI) training process which can supervise over full lattices from unlabeled audio. We also elaborate on production-scale text selection techniques for constructing very large conversational language models (LMs). On Marchex English (ME), a modern evaluation set of conversational North American English, for acoustic modeling we report a 3.3 reduction in absolute word error rate (WER). For language modeling, we observe a separate 1.3 respectively over the performance of the 2017 production system.
READ FULL TEXT