Unsupervised Multilingual Alignment using Wasserstein Barycenter

01/28/2020
by   Xin Lian, et al.
0

We study unsupervised multilingual alignment, the problem of finding word-to-word translations between multiple languages without using any parallel data. One popular strategy is to reduce multilingual alignment to the much simplified bilingual setting, by picking one of the input languages as the pivot language that we transit through. However, it is well-known that transiting through a poorly chosen pivot language (such as English) may severely degrade the translation quality, since the assumed transitive relations among all pairs of languages may not be enforced in the training process. Instead of going through a rather arbitrarily chosen pivot language, we propose to use the Wasserstein barycenter as a more informative ”mean” language: it encapsulates information from all languages and minimizes all pairwise transportation costs. We evaluate our method on standard benchmarks and demonstrate state-of-the-art performances.

READ FULL TEXT
research
12/08/2020

Globetrotter: Unsupervised Multilingual Translation from Visual Alignment

Multi-language machine translation without parallel corpora is challengi...
research
11/01/2018

GlobalTrait: Personality Alignment of Multilingual Word Embeddings

We propose a multilingual model to recognize Big Five Personality traits...
research
04/09/2020

On the Language Neutrality of Pre-trained Multilingual Representations

Multilingual contextual embeddings, such as multilingual BERT (mBERT) an...
research
01/28/2023

Multilingual Sentence Transformer as A Multilingual Word Aligner

Multilingual pretrained language models (mPLMs) have shown their effecti...
research
01/23/2023

Noisy Parallel Data Alignment

An ongoing challenge in current natural language processing is how its m...
research
10/11/2019

How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages

For language documentation initiatives, transcription is an expensive re...
research
11/02/2018

Unsupervised Hyperalignment for Multilingual Word Embeddings

We consider the problem of aligning continuous word representations, lea...

Please sign up or login with your details

Forgot password? Click here to reset