TLDR: Extreme Summarization of Scientific Documents
We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression requiring expert background knowledge and complex language understanding. To facilitate research on this task, we introduce SciTLDR, a dataset of 3.9K TLDRs. Furthermore, we introduce a novel annotation protocol for scalably curating additional gold summaries by rewriting peer review comments. We use this protocol to augment our test set, yielding multiple gold TLDRs for evaluation, which is unlike most recent summarization datasets that assume only one valid gold summary. We present a training strategy for adapting pretrained language models that exploits similarities between TLDR generation and the related tasks of extreme summarization and title generation, which outperforms strong extractive and abstractive summarization baselines.
READ FULL TEXT