Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries

01/29/2017
by   Ebrahim Ansari, et al.
0

Bilingual dictionaries are very important in various fields of natural language processing. In recent years, research on extracting new bilingual lexicons from non-parallel (comparable) corpora have been proposed. Almost all use a small existing dictionary or other resource to make an initial list called the "seed dictionary". In this paper we discuss the use of different types of dictionaries as the initial starting list for creating a bilingual Persian-Italian lexicon from a comparable corpus. Our experiments apply state-of-the-art techniques on three different seed dictionaries; an existing dictionary, a dictionary created with pivot-based schema, and a dictionary extracted from a small Persian-Italian parallel text. The interesting challenge of our approach is to find a way to combine different dictionaries together in order to produce a better and more accurate lexicon. In order to combine seed dictionaries, we propose two different combination models and examine the effect of our novel combination models on various comparable corpora that have differing degrees of comparability. We conclude with a proposal for a new weighting system to improve the extracted lexicon. The experimental results produced by our implementation show the efficiency of our proposed models.

READ FULL TEXT

page 20

page 23

research
12/19/2017

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

Most existing methods of automatic bilingual dictionary induction rely o...
research
08/08/2022

Creating Reverse Bilingual Dictionaries

Bilingual dictionaries are expensive resources and not many are availabl...
research
03/23/2022

Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Hate speech classifiers exhibit substantial performance degradation when...
research
08/27/2021

From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation

This paper describes an approach used to generate new translations using...
research
04/15/2021

Bilingual Terminology Extraction from Non-Parallel E-Commerce Corpora

Bilingual terminologies are important resources for natural language pro...
research
09/05/2023

Incorporating Dictionaries into a Neural Network Architecture to Extract COVID-19 Medical Concepts From Social Media

We investigate the potential benefit of incorporating dictionary informa...
research
05/11/2012

Are visual dictionaries generalizable?

Mid-level features based on visual dictionaries are today a cornerstone ...

Please sign up or login with your details

Forgot password? Click here to reset