TDMSci: A Specialized Corpus for Scientific Literature Entity Tagging of Tasks Datasets and Metrics

01/25/2021
by   Yufang Hou, et al.
0

Tasks, Datasets and Evaluation Metrics are important concepts for understanding experimental scientific papers. However, most previous work on information extraction for scientific literature mainly focuses on the abstracts only, and does not treat datasets as a separate type of entity (Zadeh and Schumann, 2016; Luan et al., 2018). In this paper, we present a new corpus that contains domain expert annotations for Task (T), Dataset (D), Metric (M) entities on 2,000 sentences extracted from NLP papers. We report experiment results on TDM extraction using a simple data augmentation strategy and apply our tagger to around 30,000 NLP papers from the ACL Anthology. The corpus is made publicly available to the community for fostering research on scientific publication summarization (Erera et al., 2019) and knowledge discovery.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

End-to-End NLP Knowledge Graph Construction

This paper studies the end-to-end construction of an NLP Knowledge Graph...
research
05/06/2018

Construction of the Literature Graph in Semantic Scholar

We describe a deployed scalable system for organizing published scientif...
research
06/11/2020

High-Precision Extraction of Emerging Concepts from Scientific Literature

Identification of new concepts in scientific literature can help power f...
research
06/21/2019

Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

While the fast-paced inception of novel tasks and new datasets helps fos...
research
11/29/2019

Method and Dataset Mining in Scientific Papers

Literature analysis facilitates researchers better understanding the dev...
research
05/24/2023

The ACL OCL Corpus: advancing Open science in Computational Linguistics

We present a scholarly corpus from the ACL Anthology to assist Open scie...
research
10/26/2020

Method and Dataset Entity Mining in Scientific Literature: A CNN + Bi-LSTM Model with Self-attention

Literature analysis facilitates researchers to acquire a good understand...

Please sign up or login with your details

Forgot password? Click here to reset