Fast Extraction of Word Embedding from Q-contexts

by   Junsheng Kong, et al.
University of Cambridge
South China University of Technology International Student Union
Tsinghua University

The notion of word embedding plays a fundamental role in natural language processing (NLP). However, pre-training word embedding for very large-scale vocabulary is computationally challenging for most existing methods. In this work, we show that with merely a small fraction of contexts (Q-contexts)which are typical in the whole corpus (and their mutual information with words), one can construct high-quality word embedding with negligible errors. Mutual information between contexts and words can be encoded canonically as a sampling state, thus, Q-contexts can be fast constructed. Furthermore, we present an efficient and effective WEQ method, which is capable of extracting word embedding directly from these typical contexts. In practical scenarios, our algorithm runs 11∼13 times faster than well-established methods. By comparing with well-known methods such as matrix factorization, word2vec, GloVeand fasttext, we demonstrate that our method achieves comparable performance on a variety of downstream NLP tasks, and in the meanwhile maintains run-time and resource advantages over all these baselines.


Musical Word Embedding: Bridging the Gap between Listening Contexts and Music

Word embedding pioneered by Mikolov et al. is a staple technique for wor...

Task-Specific Dependency-based Word Embedding Methods

Two task-specific dependency-based word embedding methods are proposed f...

An Analysis of Word2Vec for the Italian Language

Word representation is fundamental in NLP tasks, because it is precisely...

Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory

In this paper we take a state-of-the-art model for distributed word repr...

Improving Chinese Segmentation-free Word Embedding With Unsupervised Association Measure

Recent work on segmentation-free word embedding(sembei) developed a new ...

A Novel Bilingual Word Embedding Method for Lexical Translation Using Bilingual Sense Clique

Most of the existing methods for bilingual word embedding only consider ...

Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form

Word embedding algorithms produce very reliable feature representations ...

Please sign up or login with your details

Forgot password? Click here to reset