Unsupervised Key-phrase Extraction and Clustering for Classification Scheme in Scientific Publications

by   Xiajing Li, et al.

Several methods have been explored for automating parts of Systematic Mapping (SM) and Systematic Review (SR) methodologies. Challenges typically evolve around the gaps in semantic understanding of text, as well as lack of domain and background knowledge necessary to bridge that gap. In this paper we investigate possible ways of automating parts of the SM/SR process, i.e. that of extracting keywords and key-phrases from scientific documents using unsupervised methods, which are then used as a basis to construct the corresponding Classification Scheme using semantic key-phrase clustering techniques. Specifically, we explore the effect of ensemble scores measure in key-phrase extraction, we explore semantic network based word embedding in embedding representation of phrase semantics and finally we also explore how clustering can be used to group related key-phrases. The evaluation is conducted on a dataset of publications pertaining the domain of "Explainable AI" which we constructed using standard publicly available digital libraries and sets of indexing terms (keywords). Results shows that: ensemble ranking score does improve the key-phrase extraction performance. Semantic-network based word embedding based on the ConceptNet Semantic Network has similar performance with contextualized word embedding, however the former are computationally more efficient. Finally Semantic key-phrase clustering at term-level can group similar terms together that can be suitable for classification scheme.


page 1

page 2

page 3

page 4


Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Keyword extraction is a fundamental task in natural language processing ...

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

In this paper, we investigate the integration of sentence position and s...

Patents Phrase to Phrase Semantic Matching Dataset

There are many general purpose benchmark datasets for Semantic Textual S...

Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding

Background: Keyword extraction is a popular research topic in the field ...

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

Keyphrase extraction aims at automatically extracting a list of "importa...

PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction

Dependency context-based word embedding jointly learns the representatio...

Key Phrase Classification in Complex Assignments

Complex assignments typically consist of open-ended questions with large...

Please sign up or login with your details

Forgot password? Click here to reset