Large-scale Transfer Learning for Low-resource Spoken Language Understanding

by   Xueli Jia, et al.

End-to-end Spoken Language Understanding (SLU) models are made increasingly large and complex to achieve the state-ofthe-art accuracy. However, the increased complexity of a model can also introduce high risk of over-fitting, which is a major challenge in SLU tasks due to the limitation of available data. In this paper, we propose an attention-based SLU model together with three encoder enhancement strategies to overcome data sparsity challenge. The first strategy focuses on the transferlearning approach to improve feature extraction capability of the encoder. It is implemented by pre-training the encoder component with a quantity of Automatic Speech Recognition annotated data relying on the standard Transformer architecture and then fine-tuning the SLU model with a small amount of target labelled data. The second strategy adopts multitask learning strategy, the SLU model integrates the speech recognition model by sharing the same underlying encoder, such that improving robustness and generalization ability. The third strategy, learning from Component Fusion (CF) idea, involves a Bidirectional Encoder Representation from Transformer (BERT) model and aims to boost the capability of the decoder with an auxiliary network. It hence reduces the risk of over-fitting and augments the ability of the underlying encoder, indirectly. Experiments on the FluentAI dataset show that cross-language transfer learning and multi-task strategies have been improved by up to 4:52 to the baseline.


page 1

page 2

page 3

page 4


Bidirectional Representations for Low Resource Spoken Language Understanding

Most spoken language understanding systems use a pipeline approach compo...

Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding

End-to-end approaches open a new way for more accurate and efficient spo...

Understanding Semantics from Speech Through Pre-training

End-to-end Spoken Language Understanding (SLU) is proposed to infer the ...

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Stream fusion, also known as system combination, is a common technique i...

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

The goal of spoken language understanding (SLU) systems is to determine ...

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

End-to-end spoken language understanding (SLU) systems benefit from pret...

Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

This work focuses on improving the Spoken Language Identification (LangI...

Please sign up or login with your details

Forgot password? Click here to reset