Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System

05/23/2018
by   Judith Gaspers, et al.
0

This paper investigates the use of Machine Translation (MT) to bootstrap a Natural Language Understanding (NLU) system for a new language for the use case of a large-scale voice-controlled device. The goal is to decrease the cost and time needed to get an annotated corpus for the new language, while still having a large enough coverage of user requests. Different methods of filtering MT data in order to keep utterances that improve NLU performance and language-specific post-processing methods are investigated. These methods are tested in a large-scale NLU task with translating around 10 millions training utterances from English to German. The results show a large improvement for using MT data over a grammar-based and over an in-house data collection baseline, while reducing the manual effort greatly. Both filtering and post-processing approaches improve results further.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2019

Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition

Building conversational speech recognition systems for new languages is ...
research
07/06/2019

Evolutionary Algorithm for Sinhala to English Translation

Machine Translation (MT) is an area in natural language processing, whic...
research
06/25/2020

Neural Machine Translation For Paraphrase Generation

Training a spoken language understanding system, as the one in Alexa, ty...
research
04/06/2015

Bengali to Assamese Statistical Machine Translation using Moses (Corpus Based)

Machine dialect interpretation assumes a real part in encouraging man-ma...
research
05/09/2022

CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality

The machine translation (MT) task is typically formulated as that of ret...
research
07/05/2023

To be or not to be: a translation reception study of a literary text translated into Dutch and Catalan using machine translation

This article presents the results of a study involving the reception of ...
research
01/25/2022

Convex Polytope Modelling for Unsupervised Derivation of Semantic Structure for Data-efficient Natural Language Understanding

Popular approaches for Natural Language Understanding (NLU) usually rely...

Please sign up or login with your details

Forgot password? Click here to reset