Probing the topological properties of complex networks modeling short written texts

12/29/2014
by   Diego R. Amancio, et al.
0

In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well -- many informative discoveries have been made this way -- but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyzes performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks.

READ FULL TEXT

page 9

page 13

research
07/28/2015

Classifying informative and imaginative prose using complex networks

Statistical methods have been widely employed in recent years to grasp m...
research
09/17/2015

Network analysis of named entity co-occurrences in written texts

The use of methods borrowed from statistics and physics to analyze writt...
research
06/25/2016

Word sense disambiguation via bipartite representation of complex networks

In recent years, concepts and methods of complex networks have been empl...
research
06/30/2015

A complex network approach to stylometry

Statistical methods have been widely employed to study the fundamental p...
research
10/20/2016

Authorship Attribution Based on Life-Like Network Automata

The authorship attribution is a problem of considerable practical and te...
research
04/09/2015

Concentric network symmetry grasps authors' styles in word adjacency networks

Several characteristics of written texts have been inferred from statist...
research
07/18/2021

A pattern recognition approach for distinguishing between prose and poetry

Poetry and prose are written artistic expressions that help us to apprec...

Please sign up or login with your details

Forgot password? Click here to reset