ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

01/14/2022
by   Mengsay Loem, et al.
0

Neural models trained with large amount of parallel data have achieved impressive performance in abstractive summarization tasks. However, large-scale parallel corpora are expensive and challenging to construct. In this work, we introduce a low-cost and effective strategy, ExtraPhrase, to augment training data for abstractive summarization tasks. ExtraPhrase constructs pseudo training data in two steps: extractive summarization and paraphrasing. We extract major parts of an input text in the extractive summarization step, and obtain its diverse expressions with the paraphrasing step. Through experiments, we show that ExtraPhrase improves the performance of abstractive summarization tasks by more than 0.50 points in ROUGE scores compared to the setting without data augmentation. ExtraPhrase also outperforms existing methods such as back-translation and self-training. We also show that ExtraPhrase is significantly effective when the amount of genuine training data is remarkably small, i.e., a low-resource setting. Moreover, ExtraPhrase is more cost-efficient than the existing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2019

Transforming Wikipedia into Augmented Data for Query-Focused Summarization

The manual construction of a query-focused summarization corpus is costl...
research
04/24/2020

G-DAUG: Generative Data Augmentation for Commonsense Reasoning

Recent advances in commonsense reasoning depend on large-scale human-ann...
research
06/12/2023

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

In this work we investigate the impact of applying textual data augmenta...
research
05/29/2023

Abstractive Summarization as Augmentation for Document-Level Event Detection

Transformer-based models have consistently produced substantial performa...
research
06/12/2019

BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization

The success of neural summarization models stems from the meticulous enc...
research
09/17/2021

Exploring Multitask Learning for Low-Resource AbstractiveSummarization

This paper explores the effect of using multitask learning for abstracti...
research
06/05/2019

Efficient, Lexicon-Free OCR using Deep Learning

Contrary to popular belief, Optical Character Recognition (OCR) remains ...

Please sign up or login with your details

Forgot password? Click here to reset