Cross-Language Learning for Entity Matching

10/07/2021
by   Ralph Peeters, et al.
0

Transformer-based matching methods have significantly moved the state-of-the-art for less-structured matching tasks involving textual entity descriptions. In order to excel on these tasks, Transformer-based matching methods require a decent amount of training pairs. Providing enough training data can be challenging, especially if a matcher for non-English entity descriptions should be learned. This paper explores along the use case of matching product offers from different e-shops to which extent it is possible to improve the performance of Transformer-based entity matchers by complementing a small set of training pairs in the target language, German in our case, with a larger set of English-language training pairs. Our experiments using different Transformers show that extending the German set with English pairs is always beneficial. The impact of adding the English pairs is especially high in low-resource settings in which only a rather small number of non-English pairs is available. As it is often possible to automatically gather English training pairs from the Web by using schema.org annotations, our results could proof relevant for many product matching scenarios targeting low-resource languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2020

Optimizing Transformer for Low-Resource Neural Machine Translation

Language pairs with limited amounts of parallel data, also known as low-...
research
10/23/2020

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

Bilingual word embeddings (BWEs) are useful for many cross-lingual appli...
research
01/23/2023

WDC Products: A Multi-Dimensional Entity Matching Benchmark

The difficulty of an entity matching task depends on a combination of mu...
research
07/11/2022

PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Entity Matching (EM), which aims to identify whether two entity records ...
research
09/13/2023

ProMap: Datasets for Product Mapping in E-commerce

The goal of product mapping is to decide, whether two listings from two ...
research
10/14/2019

Transformers without Tears: Improving the Normalization of Self-Attention

We evaluate three simple, normalization-centric changes to improve Trans...
research
04/08/2021

Deep Indexed Active Learning for Matching Heterogeneous Entity Representations

Given two large lists of records, the task in entity resolution (ER) is ...

Please sign up or login with your details

Forgot password? Click here to reset