WeTS: A Benchmark for Translation Suggestion

10/11/2021
by   Zhen Yang, et al.
0

Translation Suggestion (TS), which provides alternatives for specific words or phrases given the entire documents translated by machine translation (MT) <cit.>, has been proven to play a significant role in post editing (PE). However, there is still no publicly available data set to support in-depth research for this problem, and no reproducible experimental results can be followed by researchers in this community. To break this limitation, we create a benchmark data set for TS, called WeTS, which contains golden corpus annotated by expert translators on four translation directions. Apart from the human-annotated golden corpus, we also propose several novel methods to generate synthetic corpus which can substantially improve the performance of TS. With the corpus we construct, we introduce the Transformer-based model for TS, and experimental results show that our model achieves State-Of-The-Art (SOTA) results on all four translation directions, including English-to-German, German-to-English, Chinese-to-English and English-to-Chinese. Codes and corpus can be found at <https://github.com/ZhenYangIACAS/WeTS.git>.

READ FULL TEXT
research
11/30/2022

Findings of the WMT 2022 Shared Task on Translation Suggestion

We report the result of the first edition of the WMT shared task on Tran...
research
04/08/2022

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

This paper introduces GigaST, a large-scale pseudo speech translation (S...
research
02/16/2020

Neural Machine Translation with Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems ...
research
11/07/2020

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

In this work, we present the construction of multilingual parallel corpo...
research
04/27/2019

Towards Recognizing Phrase Translation Processes: Experiments on English-French

When translating phrases (words or group of words), human translators, c...
research
05/12/2022

Supplementary Material: Implementation and Experiments for GAU-based Model

In February this year Google proposed a new Transformer variant called F...
research
02/15/2022

Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model

Depression is a global mental health problem, the worst case of which ca...

Please sign up or login with your details

Forgot password? Click here to reset