CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model

by   Florian Mai, et al.
Idiap Research Institute
Christian-Albrechts-Universität zu Kiel
University of Essex

Continuous Bag of Words (CBOW) is a powerful text embedding method. Due to its strong capabilities to encode word content, CBOW embeddings perform well on a wide range of downstream tasks while being efficient to compute. However, CBOW is not capable of capturing the word order. The reason is that the computation of CBOW's word embeddings is commutative, i.e., embeddings of XYZ and ZYX are the same. In order to address this shortcoming, we propose a learning algorithm for the Continuous Matrix Space Model, which we call Continual Multiplication of Words (CMOW). Our algorithm is an adaptation of word2vec, so that it can be trained on large quantities of unlabeled text. We empirically show that CMOW better captures linguistic properties, but it is inferior to CBOW in memorizing word content. Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. Our results show that the hybrid CBOW-CMOW-model retains CBOW's strong ability to memorize word content while at the same time substantially improving its ability to encode other linguistic information by 8 also performs better on 8 out of 11 supervised downstream tasks with an average improvement of 1.2


page 1

page 2

page 3

page 4


Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Following the recent success of word embeddings, it has been argued that...

Evaluation of sentence embeddings in downstream and linguistic probing tasks

Despite the fast developmental pace of new sentence embedding methods, i...

Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

Cross-lingual word embeddings (CLWE) are often evaluated on bilingual le...

Embedding Syntax and Semantics of Prepositions via Tensor Decomposition

Prepositions are among the most frequent words in English and play compl...

Paraphrases do not explain word analogies

Many types of distributional word embeddings (weakly) encode linguistic ...

Massively Multilingual Word Embeddings

We introduce new methods for estimating and evaluating embeddings of wor...

ADEPT: A DEbiasing PrompT Framework

Several works have proven that finetuning is an applicable approach for ...

Please sign up or login with your details

Forgot password? Click here to reset