Evaluation of really good grammatical error correction

08/17/2023
by   Robert Östling, et al.
0

Although rarely stated, in practice, Grammatical Error Correction (GEC) encompasses various models with distinct objectives, ranging from grammatical error detection to improving fluency. Traditional evaluation methods fail to fully capture the full range of system capabilities and objectives. Reference-based evaluations suffer from limitations in capturing the wide variety of possible correction and the biases introduced during reference creation and is prone to favor fixing local errors over overall text improvement. The emergence of large language models (LLMs) has further highlighted the shortcomings of these evaluation strategies, emphasizing the need for a paradigm shift in evaluation methodology. In the current study, we perform a comprehensive evaluation of various GEC systems using a recently published dataset of Swedish learner texts. The evaluation is performed using established evaluation metrics as well as human judges. We find that GPT-3 in a few-shot setting by far outperforms previous grammatical error correction systems for Swedish, a language comprising only 0.11 also found that current evaluation methods contain undesirable biases that a human evaluation is able to reveal. We suggest using human post-editing of GEC system outputs to analyze the amount of change required to reach native-level human performance on the task, and provide a dataset annotated with human post-edits and assessments of grammaticality, fluency and meaning preservation of GEC system outputs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2016

There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction

Current methods for automatically evaluating grammatical error correctio...
research
03/25/2023

An Analysis of GPT-3's Performance in Grammatical Error Correction

GPT-3 models are very powerful, achieving high performance on a variety ...
research
04/30/2022

A New Evaluation Method: Evaluation Data and Metrics for Chinese Grammar Error Correction

As a fundamental task in natural language processing, Chinese Grammatica...
research
12/31/2020

Factual Error Correction of Claims

This paper introduces the task of factual error correction: performing e...
research
03/15/2023

ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark

ChatGPT is a cutting-edge artificial intelligence language model develop...
research
04/30/2018

Inherent Biases in Reference-based Evaluation for Grammatical Error Correction and Text Simplification

The prevalent use of too few references for evaluating text-to-text gene...
research
10/21/2020

Classifying Syntactic Errors in Learner Language

We present a method for classifying syntactic errors in learner language...

Please sign up or login with your details

Forgot password? Click here to reset