The PyTorch-Kaldi Speech Recognition Toolkit

by   Mirco Ravanelli, et al.

The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.


page 1

page 2

page 3

page 4


ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi

The availability of open-source software is playing a remarkable role in...

SpeechBrain: A General-Purpose Speech Toolkit

SpeechBrain is an open-source and all-in-one speech toolkit. It is desig...

XY Neural Networks

The classical XY model is a lattice model of statistical mechanics notab...

A network of deep neural networks for distant speech recognition

Despite the remarkable progress recently made in distant speech recognit...

Pykaldi2: Yet another speech toolkit based on Kaldi and Pytorch

We introduce PyKaldi2 speech recognition toolkit implemented based on Ka...

Shennong: a Python toolbox for audio speech features extraction

We introduce Shennong, a Python toolbox and command-line utility for spe...

A brief survey on deep belief networks and introducing a new object oriented toolbox (DeeBNet)

Nowadays, this is very popular to use the deep architectures in machine ...

Please sign up or login with your details

Forgot password? Click here to reset