ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

10/27/2022
by   Chantal Amrhein, et al.
0

As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2019

DiscoTK: Using Discourse Structure for Machine Translation Evaluation

We present novel automatic metrics for machine translation evaluation th...
research
09/27/2022

Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric

We hypothesize that existing sentence-level machine translation (MT) met...
research
04/02/2017

Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings

One of the most important problems in machine translation (MT) evaluatio...
research
10/07/2018

Assessing Crosslingual Discourse Relations in Machine Translation

In an attempt to improve overall translation quality, there has been an ...
research
11/02/2019

Machine Translation Evaluation using Bi-directional Entailment

In this paper, we propose a new metric for Machine Translation (MT) eval...
research
02/11/2022

Evaluating MT Systems: A Theoretical Framework

This paper outlines a theoretical framework using which different automa...
research
12/20/2022

Extrinsic Evaluation of Machine Translation Metrics

Automatic machine translation (MT) metrics are widely used to distinguis...

Please sign up or login with your details

Forgot password? Click here to reset