Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

by   Krupakar Hans, et al.

The advent of the attention mechanism in neural machine translation models has improved the performance of machine translation systems by enabling selective lookup into the source sentence. In this paper, the efficiencies of translation using bidirectional encoder attention decoder models were studied with respect to translation involving morphologically rich languages. The English - Tamil language pair was selected for this analysis. First, the use of Word2Vec embedding for both the English and Tamil words improved the translation results by 0.73 BLEU points over the baseline RNNSearch model with 4.84 BLEU score. The use of morphological segmentation before word vectorization to split the morphologically rich Tamil words into their respective morphemes before the translation, caused a reduction in the target vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the performance of neural machine translation by 7.05 BLEU points over the RNNSearch model used over the same corpus. Since the BLEU evaluation of the RNNMorph model might be unreliable due to an increase in the number of matching tokens per sentence, the performances of the translations were also compared by means of human evaluation metrics of adequacy, fluency and relative ranking. Further, the use of morphological segmentation also improved the efficacy of the attention mechanism.


page 6

page 8

page 14

page 15


Word Representation Models for Morphologically Rich Languages in Neural Machine Translation

Dealing with the complex word forms in morphologically rich languages is...

Neural Machine Translation System of Indic Languages – An Attention based Approach

Neural machine translation (NMT) is a recent and effective technique whi...

Controlling Extra-Textual Attributes about Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation

Unlike English, morphologically rich languages can reveal characteristic...

Kannada Spell Checker with Sandhi Splitter

Spelling errors are introduced in text either during typing, or when the...

Neural Machine Translation for Cebuano to Tagalog with Subword Unit Translation

The Philippines is an archipelago composed of 7, 641 different islands w...

Facilitating Terminology Translation with Target Lemma Annotations

Most of the recent work on terminology integration in machine translatio...

Iterative Refinement for Machine Translation

Existing machine translation decoding algorithms generate translations i...

Please sign up or login with your details

Forgot password? Click here to reset