End-to-end model for named entity recognition from speech without paired training data

by   Salima Mdhaffar, et al.

Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU). Through the term end-to-end, one considers the use of a single model optimized to extract semantic information directly from the speech signal. A major issue for such models is the lack of paired audio and textual data with semantic annotation. In this paper, we propose an approach to build an end-to-end neural model to extract semantic information in a scenario in which zero paired audio data is available. Our approach is based on the use of an external model trained to generate a sequence of vectorial representations from text. These representations mimic the hidden representations that could be generated inside an end-to-end automatic speech recognition (ASR) model by processing a speech signal. An SLU neural module is then trained using these representations as input and the annotated text as output. Last, the SLU module replaces the top layers of the ASR model to achieve the construction of the end-to-end model. Our experiments on named entity recognition, carried out on the QUAERO corpus, show that this approach is very promising, getting better results than a comparable cascade approach or than the use of synthetic voices.


End-to-end named entity extraction from speech

Named entity recognition (NER) is among SLU tasks that usually extract s...

End-to-end Named Entity Recognition from English Speech

Named entity recognition (NER) from text has been a widely studied probl...

Where are we in semantic concept extraction for Spoken Language Understanding?

Spoken language understanding (SLU) topic has seen a lot of progress the...

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

End-to-end spoken language understanding (SLU) systems are gaining popul...

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Spoken Language Understanding (SLU) is a task that aims to extract seman...

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

We introduce Wav2Seq, the first self-supervised approach to pre-train bo...

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Spoken Language Understanding (SLU) is a core task in most human-machine...

Please sign up or login with your details

Forgot password? Click here to reset