Corpora Evaluation and System Bias Detection in Multi-document Summarization

10/05/2020
by   Alvin Dey, et al.
0

Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is also no standard regarding what constitutes summary information in MDS. Adding to the challenge is the fact that new systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets. In this paper, we study this heterogeneous task with the help of a few widely used MDS corpora and a suite of state-of-the-art models. We make an attempt to quantify the quality of summarization corpus and prescribe a list of points to consider while proposing a new MDS corpus. Next, we analyze the reason behind the absence of an MDS system which achieves superior performance across all corpora. We then observe the extent to which system metrics are influenced, and bias is propagated due to corpus properties. The scripts to reproduce the experiments in this work are available at https://github.com/LCS2-IIITD/summarization_bias.git.

READ FULL TEXT
research
10/23/2022

How "Multi" is Multi-Document Summarization?

The task of multi-document summarization (MDS) aims at models that, give...
research
03/12/2023

Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization

Multi-document summarization (MDS) aims to generate a summary for a numb...
research
11/12/2018

CQASUMM: Building References for Community Question Answering Summarization Corpora

Community Question Answering forums such as Quora, Stackoverflow are ric...
research
08/03/2017

Reader-Aware Multi-Document Summarization: An Enhanced Model and The First Dataset

We investigate the problem of reader-aware multi-document summarization ...
research
07/22/2020

Massive Multi-Document Summarization of Product Reviews with Weak Supervision

Product reviews summarization is a type of Multi-Document Summarization ...
research
09/10/2023

Multi-document Summarization: A Comparative Evaluation

This paper is aimed at evaluating state-of-the-art models for Multi-docu...
research
09/30/2020

Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning

While neural sequence learning methods have made significant progress in...

Please sign up or login with your details

Forgot password? Click here to reset