Exploration of End-to-End ASR for OpenSTT – Russian Open Speech-to-Text Dataset

06/15/2020
by   Andrei Andrusenko, et al.
0

This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set – OpenSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transducer, and Transformer. All of them are compared with the strong hybrid ASR system based on LF-MMI TDNN-F acoustic model. For the three available validation sets (phone calls, YouTube, and books), our best end-to-end model achieves word error rate (WER) of 34.8 respectively. Under the same conditions, the hybridASR system demonstrates 33.5

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset