Measuring Word Significance using Distributed Representations of Words

08/10/2015
by   Adriaan M. J. Schakel, et al.
0

Distributed representations of words as real-valued vectors in a relatively low-dimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et al., 2013a; Mikolov et al., 2013b), was shown to encode semantic information in the direction of the word vectors. In this brief report, it is proposed to use the length of the vectors, together with the term frequency, as measure of word significance in a corpus. Experimental evidence using a domain-specific corpus of abstracts is presented to support this proposal. A useful visualization technique for text corpora emerges, where words are mapped onto a two-dimensional plane and automatically ranked by significance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2021

A Rule-based/BPSO Approach to Produce Low-dimensional Semantic Basis Vectors Set

We intend to generate low-dimensional explicit distributional semantic v...
research
06/08/2018

Text Classification based on Word Subspace with Term-Frequency

Text classification has become indispensable due to the rapid increase o...
research
11/18/2020

Accelerating Text Mining Using Domain-Specific Stop Word Lists

Text preprocessing is an essential step in text mining. Removing words t...
research
04/10/2017

Exploring Word Embeddings for Unsupervised Textual User-Generated Content Normalization

Text normalization techniques based on rules, lexicons or supervised tra...
research
08/11/2017

Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding

Many word clouds provide no semantics to the word placement, but use a r...
research
08/22/2018

Deep Extrofitting: Specialization and Generalization of Expansional Retrofitting Word Vectors using Semantic Lexicons

The retrofitting techniques, which inject external resources into word r...
research
02/05/2017

Prepositions in Context

Prepositions are highly polysemous, and their variegated senses encode s...

Please sign up or login with your details

Forgot password? Click here to reset