Unsupervised Extraction of Representative Concepts from Scientific Literature

by   Adit Krishnan, et al.

This paper studies the automated categorization and extraction of scientific concepts from titles of scientific articles, in order to gain a deeper understanding of their key contributions and facilitate the construction of a generic academic knowledgebase. Towards this goal, we propose an unsupervised, domain-independent, and scalable two-phase algorithm to type and extract key concept mentions into aspects of interest (e.g., Techniques, Applications, etc.). In the first phase of our algorithm we propose PhraseType, a probabilistic generative model which exploits textual features and limited POS tags to broadly segment text snippets into aspect-typed phrases. We extend this model to simultaneously learn aspect-specific features and identify academic domains in multi-domain corpora, since the two tasks mutually enhance each other. In the second phase, we propose an approach based on adaptor grammars to extract fine grained concept mentions from the aspect-typed phrases without the need for any external resources or human effort, in a purely data-driven manner. We apply our technique to study literature from diverse scientific domains and show significant gains over state-of-the-art concept extraction techniques. We also present a qualitative analysis of the results obtained.


Domain-independent Extraction of Scientific Concepts from Research Articles

We examine the novel task of domain-independent scientific concept extra...

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

Keyphrase extraction aims at automatically extracting a list of "importa...

Automated Creation and Human-assisted Curation of Computable Scientific Models from Code and Text

Scientific models hold the key to better understanding and predicting th...

Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

What kind of basic research ideas are more likely to get applied in prac...

High-Precision Extraction of Emerging Concepts from Scientific Literature

Identification of new concepts in scientific literature can help power f...

A Review on Method Entities in the Academic Literature: Extraction, Evaluation, and Application

In scientific research, the method is an indispensable means to solve sc...

Toponym Identification in Epidemiology Articles -- A Deep Learning Approach

When analyzing the spread of viruses, epidemiologists often need to iden...

Please sign up or login with your details

Forgot password? Click here to reset