Dataset for Automatic Summarization of Russian News

06/19/2020
by   Ilya Gusev, et al.
0

Automatic text summarization has been studied in a variety of domains and languages. However, this does not hold for the Russian language. To overcome this issue, we present Gazeta, the first dataset for summarization of Russian news. We describe the properties of this dataset and benchmark several extractive and abstractive models. We demonstrate that the dataset is a valid task for methods of text summarization for Russian. Additionally, we prove the pretrained mBART model to be useful for Russian text summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2019

BillSum: A Corpus for Automatic Summarization of US Legislation

Automatic summarization methods have been studied on a variety of domain...
research
10/12/2018

IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Automatic text summarization is generally considered as a challenging ta...
research
02/12/2021

SumeCzech: Large Czech News-Based Summarization Dataset

Document summarization is a well-studied NLP task. With the emergence of...
research
02/01/2023

HunSum-1: an Abstractive Summarization Dataset for Hungarian

We introduce HunSum-1: a dataset for Hungarian abstractive summarization...
research
04/11/2022

Evaluation of Automatic Text Summarization using Synthetic Facts

Despite some recent advances, automatic text summarization remains unrel...
research
01/05/2023

Unsupervised Broadcast News Summarization; a comparative study on Maximal Marginal Relevance (MMR) and Latent Semantic Analysis (LSA)

The methods of automatic speech summarization are classified into two gr...
research
07/15/2020

Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Summarizing texts is not a straightforward task. Before even considering...

Please sign up or login with your details

Forgot password? Click here to reset