Using citation networks to evaluate the impact of text length on the identification of relevant concepts

by   Jorge A. V. Tohalino, et al.

The identification of the most significant concepts in unstructured data is of critical importance in various practical applications. Despite the large number of methods that have been put forth to extract the main topics of texts, a limited number of studies have probed the impact of the text length on the performance of keyword extraction (KE) methods. In this study, we adopted a network-based approach to evaluate whether keywords extracted from paper abstracts are compatible with keywords extracted from full papers. We employed a community detection method to identify groups of related papers in citation networks. These paper clusters were then employed to extract keywords from abstracts. Our results indicate that while the various community detection methods employed in our KE approach yielded similar levels of accuracy, a correlation analysis revealed that these methods produced distinct keyword lists for each abstract. We also observed that all considered approaches, however, reach low values of accuracy. Surprisingly, text clustering approaches outperformed all citation-based methods. The findings suggest that using different sources of information to extract keywords can lead to significant differences in performance. This effect can play an important role in applications relying upon the identification of relevant concepts.


page 7

page 17


On the Stability of Citation Networks

Citation networks can reveal many important information regarding the de...

Query Generation for Patent Retrieval with Keyword Extraction based on Syntactic Features

This paper describes a new method to extract relevant keywords from pate...

FRAKE: Fusional Real-time Automatic Keyword Extraction

Keyword extraction is called identifying words or phrases that express t...

Using virtual edges to extract keywords from texts modeled as complex networks

Detecting keywords in texts is important for many text mining applicatio...

Analyzing the relationship between text features and research proposal productivity

Predicting the output of research grants is of considerable relevance to...

Cited Text Spans for Citation Text Generation

Automatic related work generation must ground their outputs to the conte...

An interdisciplinary survey of network similarity methods

Comparative graph and network analysis play an important role in both sy...

Please sign up or login with your details

Forgot password? Click here to reset