MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions

09/26/2019
by   Christina Niklaus, et al.
0

We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2023

Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions

Sentences that present a complex syntax act as a major stumbling block f...
research
07/30/2015

Unsupervised Sentence Simplification Using Deep Semantics

We present a novel approach to sentence simplification which departs fro...
research
07/21/2017

Split and Rephrase

We propose a new sentence simplification task (Split-and-Rephrase) where...
research
03/18/2016

Readability-based Sentence Ranking for Evaluating Text Simplification

We propose a new method for evaluating the readability of simplified sen...
research
02/02/2023

The Fewer Splits are Better: Deconstructing Readability in Sentence Splitting

In this work, we focus on sentence splitting, a subfield of text simplif...
research
01/16/2020

Fact-aware Sentence Split and Rephrase with Permutation Invariant Training

Sentence Split and Rephrase aims to break down a complex sentence into s...
research
09/26/2019

DisSim: A Discourse-Aware Syntactic Text Simplification Frameworkfor English and German

We introduce DisSim, a discourse-aware sentence splitting framework for ...

Please sign up or login with your details

Forgot password? Click here to reset