Refinement of Unsupervised Cross-Lingual Word Embeddings

02/21/2020
by   Magdalena Biesialska, et al.
0

Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion's share of the methods are projection-based approaches that map pre-trained embeddings into a shared latent space. These methods are mostly based on the orthogonal transformation, which assumes language vector spaces to be isomorphic. However, this criterion does not necessarily hold, especially for morphologically-rich languages. In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings. The proposed model moves vectors of words and their corresponding translations closer to each other as well as enforces length- and center-invariance, thus allowing to better align cross-lingual embeddings. The experimental results demonstrate the effectiveness of our approach, as in most cases it outperforms state-of-the-art methods in a bilingual lexicon induction task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

Unsupervised Multilingual Word Embeddings

Multilingual Word Embeddings (MWEs) represent words from multiple langua...
research
10/09/2018

Unsupervised Object Matching for Relational Data

We propose an unsupervised object matching method for relational data, w...
research
07/18/2020

On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning

The emergence of unsupervised word embeddings, pre-trained on very large...
research
12/29/2017

Detecting Cross-Lingual Plagiarism Using Simulated Word Embeddings

Cross-lingual plagiarism (CLP) occurs when texts written in one language...
research
10/18/2022

RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction

Bilingual lexicon induction induces the word translations by aligning in...
research
10/27/2020

Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora

We propose a new approach for learning contextualised cross-lingual word...
research
04/28/2020

LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space

Most of the successful and predominant methods for bilingual lexicon ind...

Please sign up or login with your details

Forgot password? Click here to reset