Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

by   Kung-Hsiang Huang, et al.

Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, to our knowledge, the summarization of diverse information dispersed across multiple articles about an event has not been previously investigated. The latter imposes a different set of challenges for a summarization model. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference. Moreover, we conducted a comprehensive analysis to pinpoint the position and verbosity biases when utilizing Large Language Model (LLM)-based metrics for evaluating the coverage and faithfulness of the summaries, as well as their correlation with human assessments. We applied our findings to study how LLMs summarize multiple news articles by analyzing which type of diverse information LLMs are capable of identifying. Our analyses suggest that despite the extraordinary capabilities of LLMs in single-document summarization, the proposed task remains a complex challenge for them mainly due to their limited coverage, with GPT-4 only able to cover less than 40 information on average.


page 16

page 17

page 18

page 19

page 20

page 21

page 22

page 23


Summarization of Films and Documentaries Based on Subtitles and Scripts

We assess the performance of generic text summarization algorithms appli...

Generating Representative Headlines for News Stories

of news articles are published online every day, which can be overwhelm...

SumREN: Summarizing Reported Speech about Events in News

A primary objective of news articles is to establish the factual record ...

Screenplay Summarization Using Latent Narrative Structure

Most general-purpose extractive summarization models are trained on news...

Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles

Recent advances in natural language processing have enabled automation o...

Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Despite the recent developments on neural summarization systems, the und...

Content Selection in Deep Learning Models of Summarization

We carry out experiments with deep learning models of summarization acro...

Please sign up or login with your details

Forgot password? Click here to reset