Multiple Segmentations of Thai Sentences for Neural Machine Translation

04/23/2020
by   Alberto Poncelas, et al.
0

Thai is a low-resource language, so it is often the case that data is not available in sufficient quantities to train an Neural Machine Translation (NMT) model which perform to a high level of quality. In addition, the Thai script does not use white spaces to delimit the boundaries between words, which adds more complexity when building sequence to sequence models. In this work, we explore how to augment a set of English–Thai parallel data by replicating sentence-pairs with different word segmentation methods on Thai, as training data for NMT model training. Using different merge operations of Byte Pair Encoding, different segmentations of Thai sentences can be obtained. The experiments show that combining these datasets, performance is improved for NMT models trained with a dataset that has been split using a supervised splitting tool.

READ FULL TEXT
research
09/30/2019

Regressing Word and Sentence Embeddings for Regularization of Neural Machine Translation

In recent years, neural machine translation (NMT) has become the dominan...
research
05/18/2018

Combining Advanced Methods in Japanese-Vietnamese Neural Machine Translation

Neural machine translation (NMT) systems have recently obtained state-of...
research
09/26/2019

Selecting Artificially-Generated Sentences for Fine-Tuning Neural Machine Translation

Neural Machine Translation (NMT) models tend to achieve best performance...
research
08/10/2022

Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation

Although the problem of hallucinations in neural machine translation (NM...
research
10/24/2016

Bridging Neural Machine Translation and Bilingual Dictionaries

Neural Machine Translation (NMT) has become the new state-of-the-art in ...
research
10/09/2020

Uncertainty-Aware Semantic Augmentation for Neural Machine Translation

As a sequence-to-sequence generation task, neural machine translation (N...
research
08/01/2018

Low-Latency Neural Speech Translation

Through the development of neural machine translation, the quality of ma...

Please sign up or login with your details

Forgot password? Click here to reset