PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation

10/23/2021
by   Long Doan, et al.
0

We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3.02M sentence pairs, which is 2.9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15. We conduct experiments comparing strong neural baselines and well-known automatic translation engines on our dataset and find that in both automatic and human evaluations: the best performance is obtained by fine-tuning the pre-trained sequence-to-sequence denoising auto-encoder mBART. To our best knowledge, this is the first large-scale Vietnamese-English machine translation study. We hope our publicly available dataset and study can serve as a starting point for future research and applications on Vietnamese-English machine translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2022

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

In this paper, we introduce a high-quality and large-scale benchmark dat...
research
09/15/2019

A simple discriminative training method for machine translation with large-scale features

Margin infused relaxed algorithms (MIRAs) dominate model tuning in stati...
research
06/06/2021

Itihasa: A large-scale corpus for Sanskrit to English translation

This work introduces Itihasa, a large-scale translation dataset containi...
research
01/30/2023

Adaptive Machine Translation with Large Language Models

Consistency is a key requirement of high-quality translation. It is espe...
research
01/18/2021

Automatic punctuation restoration with BERT models

We present an approach for automatic punctuation restoration with BERT m...
research
04/29/2021

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

Human evaluation of modern high-quality machine translation systems is a...
research
10/26/2019

Yall should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts

Distinguishing between singular and plural "you" in English is a challen...

Please sign up or login with your details

Forgot password? Click here to reset