Revisiting Automatic Question Summarization Evaluation in the Biomedical Domain

03/18/2023
by   Hongyi Yuan, et al.
0

Automatic evaluation metrics have been facilitating the rapid development of automatic summarization methods by providing instant and fair assessments of the quality of summaries. Most metrics have been developed for the general domain, especially news and meeting notes, or other language-generation tasks. However, these metrics are applied to evaluate summarization systems in different domains, such as biomedical question summarization. To better understand whether commonly used evaluation metrics are capable of evaluating automatic summarization in the biomedical domain, we conduct human evaluations of summarization quality from four different aspects of a biomedical question summarization task. Based on human judgments, we identify different noteworthy features for current automatic metrics and summarization systems as well. We also release a dataset of our human annotations to aid the research of summarization evaluation metrics in the biomedical domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2023

Human-like Summarization Evaluation with ChatGPT

Evaluating text summarization is a challenging problem, and existing eva...
research
05/13/2021

Towards Human-Free Automatic Quality Evaluation of German Summarization

Evaluating large summarization corpora using humans has proven to be exp...
research
07/24/2020

SummEval: Re-evaluating Summarization Evaluation

The scarcity of comprehensive up-to-date studies on evaluation metrics f...
research
06/02/2021

Evaluating the Efficacy of Summarization Evaluation across Languages

While automatic summarization evaluation methods developed for English a...
research
05/27/2023

An Investigation of Evaluation Metrics for Automated Medical Note Generation

Recent studies on automatic note generation have shown that doctors can ...
research
10/10/2022

Readability Controllable Biomedical Document Summarization

Different from general documents, it is recognised that the ease with wh...
research
08/31/2022

The Glass Ceiling of Automatic Evaluation in Natural Language Generation

Automatic evaluation metrics capable of replacing human judgments are cr...

Please sign up or login with your details

Forgot password? Click here to reset