An Autoencoder Approach to Learning Bilingual Word Representations

02/06/2014
by   Sarath Chandar A P, et al.
0

Cross-language learning allows us to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are aligned between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. Since training autoencoders on word observations presents certain computational issues, we propose and compare different variations adapted to this setting. We also propose an explicit correlation maximizing regularizer that leads to significant improvement in the performance. We empirically investigate the success of our approach on the problem of cross-language test classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). These experiments demonstrate that our approaches are competitive with the state-of-the-art, achieving up to 10-14 percentage point improvements over the best reported results on this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2014

Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

Recent work on learning multilingual word representations usually relies...
research
08/09/2016

Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders

Current approaches to learning vector representations of text that are c...
research
12/21/2020

Subword Sampling for Low Resource Word Alignment

Annotation projection is an important area in NLP that can greatly contr...
research
04/09/2020

On the Language Neutrality of Pre-trained Multilingual Representations

Multilingual contextual embeddings, such as multilingual BERT (mBERT) an...
research
12/10/2019

An Ensemble Method for Producing Word Representations for the Greek Language

In this paper we present a new ensemble method, Continuous Bag-of-Skip-g...
research
12/02/2019

Solving Arithmetic Word Problems Automatically Using Transformer and Unambiguous Representations

Constructing accurate and automatic solvers of math word problems has pr...
research
09/24/2015

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from ...

Please sign up or login with your details

Forgot password? Click here to reset