Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT

03/24/2023
by   Qingyu Lu, et al.
0

Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks such as machine translation, question answering, text summarization, and natural language understanding. Recent research has shown that utilizing ChatGPT for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level but performs poorly at the segment level. To further improve the performance of LLMs on MT quality assessment, we conducted an investigation into several prompting methods. Our results indicate that by combining Chain-of-Thoughts and Error Analysis, a new prompting method called , LLMs like ChatGPT can generate human-like MT evaluations at both the system and segment level. Additionally, we discovered some limitations of ChatGPT as an MT evaluator, such as unstable scoring and biases when provided with multiple translations in a single query. Our findings aim to provide a preliminary experience for appropriately evaluating translation quality on ChatGPT while offering a variety of tricks in designing prompts for in-context learning. We anticipate that this report will shed new light on advancing the field of translation evaluation with LLMs by enhancing both the accuracy and reliability of metrics. The project can be found in <https://github.com/Coldmist-Lu/ErrorAnalysis_Prompt>.

READ FULL TEXT

page 2

page 8

research
06/13/2023

Knowledge-Prompted Estimator: A Novel Approach to Explainable Machine Translation Assessment

Cross-lingual Machine Translation (MT) quality estimation plays a crucia...
research
05/05/2021

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

To facilitate effective translation modeling and translation studies, on...
research
08/14/2023

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

Automatic evaluation of machine translation (MT) is a critical tool driv...
research
08/02/2023

Optimizing Machine Translation through Prompt Engineering: An Investigation into ChatGPT's Customizability

This paper explores the influence of integrating the purpose of the tran...
research
12/27/2021

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation

Traditional automatic evaluation metrics for machine translation have be...
research
03/14/2022

A Bayesian approach to translators' reliability assessment

Translation Quality Assessment (TQA) conducted by human translators is a...
research
06/20/2023

EvolveMT: an Ensemble MT Engine Improving Itself with Usage Only

This paper presents EvolveMT for efficiently combining multiple machine ...

Please sign up or login with your details

Forgot password? Click here to reset