Efficient Weight factorization for Multilingual Speech Recognition

05/07/2021
by   Ngoc-Quan Pham, et al.
0

End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages, resulting in a single neural network to handle transcribing different languages. Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously. In this paper we propose a novel multilingual architecture that targets the core operation in neural networks: linear transformation functions. The key idea of the method is to assign fast weight matrices for each language by decomposing each weight matrix into a shared component and a language dependent component. The latter is then factorized into vectors using rank-1 assumptions to reduce the number of parameters per language. This efficient factorization scheme is proved to be effective in two multilingual settings with 7 and 27 languages, reducing the word error rates by 26% and 27% rel. for two popular architectures LSTM and Transformer, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2019

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Multilingual end-to-end (E2E) models have shown great promise in expansi...
research
09/27/2016

Multi-task Recurrent Model for True Multilingual Speech Recognition

Research on multilingual speech recognition remains attractive yet chall...
research
09/18/2023

Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter

Multilingual intelligent assistants, such as ChatGPT, have recently gain...
research
11/06/2017

Towards Language-Universal End-to-End Speech Recognition

Building speech recognizers in multiple languages typically involves rep...
research
07/13/2021

A Configurable Multilingual Model is All You Need to Recognize All Languages

Multilingual automatic speech recognition (ASR) models have shown great ...
research
06/23/2020

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

The task of grapheme-to-phoneme (G2P) conversion is important for both s...
research
03/29/2021

A Multiplexed Network for End-to-End, Multilingual OCR

Recent advances in OCR have shown that an end-to-end (E2E) training pipe...

Please sign up or login with your details

Forgot password? Click here to reset