Polylingual Wordnet

03/04/2019
by   Mihael Arcan, et al.
0

Princeton WordNet is one of the most important resources for natural language processing, but is only available for English. While it has been translated using the expand approach to many other languages, this is an expensive manual process. Therefore it would be beneficial to have a high-quality automatic translation approach that would support NLP techniques, which rely on WordNet in new languages. The translation of wordnets is fundamentally complex because of the need to translate all senses of a word including low frequency senses, which is very challenging for current machine translation approaches. For this reason we leverage existing translations of WordNet in other languages to identify contextual information for wordnet senses from a large set of generic parallel corpora. We evaluate our approach using 10 translated wordnets for European languages. Our experiment shows a significant improvement over translation without any contextual information. Furthermore, we evaluate how the choice of pivot languages affects performance of multilingual word sense disambiguation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

Use of Machine Translation to Obtain Labeled Datasets for Resource-Constrained Languages

The large annotated datasets in NLP are overwhelmingly in English. This ...
research
07/01/2018

Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Machine translation from polysynthetic to fusional languages is a challe...
research
06/24/2020

A High-Quality Multilingual Dataset for Structured Documentation Translation

This paper presents a high-quality multilingual dataset for the document...
research
07/03/2020

El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks

Pre-training large-scale language models (LMs) requires huge amounts of ...
research
05/11/2021

Can You Traducir This? Machine Translation for Code-Switched Input

Code-Switching (CSW) is a common phenomenon that occurs in multilingual ...
research
04/22/2023

"I'm" Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets

As virtual assistants continue to be taken up globally, there is an ever...
research
09/13/2021

Graph Algorithms for Multiparallel Word Alignment

With the advent of end-to-end deep learning approaches in machine transl...

Please sign up or login with your details

Forgot password? Click here to reset