Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

10/01/2020
by   Daniel Deutsch, et al.
0

Recently, there has been growing interest in using question-answering (QA) models to evaluate the content quality of summaries. While previous work has shown initial promising results in this direction, their experimentation has been limited, leading to a poor understanding of the utility of QA in evaluating summary content. In this work, we perform an extensive evaluation of a QA-based metric for summary content quality, calculating its performance with today's state-of-the-art models as well as estimating its potential upper-bound performance. We analyze a proposed metric, QAEval, which is more widely applicable than previous work. We show that QAEval already achieves state-of-the-art performance at scoring summarization systems, beating all other metrics including the gold-standard Pyramid Method, while its performance on individual summaries is at best competitive to other automatic metrics. Through a careful analysis of each component of QAEval, we identify the performance bottlenecks and estimate that with human-level performance, QAEval's summary-level results have the potential to approach that of the Pyramid Method.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro