Soft Prompt Decoding for Multilingual Dense Retrieval

05/15/2023
by   Zhiqi Huang, et al.
0

In this work, we explore a Multilingual Information Retrieval (MLIR) task, where the collection includes documents in multiple languages. We demonstrate that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance. This is due to the heterogeneous and imbalanced nature of multilingual collections – some languages are better represented in the collection and some benefit from large-scale training data. To address this issue, we present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space. To address the challenges of data scarcity and imbalance, we introduce a knowledge distillation strategy. The teacher model is trained on rich English retrieval data, and by leveraging bi-text data, our distillation framework transfers its retrieval knowledge to the multilingual document encoder. Therefore, our approach does not require any multilingual retrieval training data. Extensive experiments on three MLIR datasets with a total of 15 languages demonstrate that KD-SPD significantly outperforms competitive baselines in all cases. We conduct extensive analyses to show that our method has less language bias and better zero-shot transfer ability towards new languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2023

Cross-lingual Knowledge Transfer via Distillation for Multilingual Information Retrieval

In this paper, we introduce the approach behind our submission for the M...
research
10/07/2022

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multilingual text-video retrieval methods have improved significantly in...
research
10/11/2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers

Tokenization is a crucial step in information retrieval, especially for ...
research
12/30/2019

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

While billions of non-English speaking users rely on search engines ever...
research
08/24/2022

Improving video retrieval using multilingual knowledge transfer

Video retrieval has seen tremendous progress with the development of vis...
research
11/03/2021

Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Interactive and non-interactive model are the two de-facto standard fram...
research
03/27/2023

Lexicon-Enhanced Self-Supervised Training for Multilingual Dense Retrieval

Recent multilingual pre-trained models have shown better performance in ...

Please sign up or login with your details

Forgot password? Click here to reset