BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval

05/18/2023
by   Shicheng Xu, et al.
0

Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets. However, previous studies have found that dense retrieval is hard to generalize to unseen domains due to its weak modeling of domain-invariant and interpretable feature (i.e., matching signal between two texts, which is the essence of information retrieval). In this paper, we propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM. Fully fine-grained expression and query-oriented saliency are two properties of the matching signal. Thus, in BERM, a single passage is segmented into multiple units and two unit-level requirements are proposed for representation as the constraint in training to obtain the effective matching signal. One is semantic unit balance and the other is essential matching unit extractability. Unit-level view and balanced semantics make representation express the text in a fine-grained manner. Essential matching unit extractability makes passage representation sensitive to the given query to extract the pure matching information from the passage containing complex context. Experiments on BEIR show that our method can be effectively combined with different dense retrieval training methods (vanilla, hard negatives mining and knowledge distillation) to improve its generalization ability without any additional inference overhead and target domain data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2022

NIR-Prompt: A Multi-task Generalized Neural Information Retrieval Training Framework

Information retrieval aims to find information that meets users' needs f...
research
12/13/2022

Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

Although neural information retrieval has witnessed great improvements, ...
research
04/06/2022

Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning

Text matching is a fundamental technique in both information retrieval a...
research
10/14/2021

Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations

Dense retrieval (DR) methods conduct text retrieval by first encoding te...
research
08/11/2022

Disentangled Modeling of Domain and Relevance for Adaptable Dense Retrieval

Recent advance in Dense Retrieval (DR) techniques has significantly impr...
research
10/21/2022

Reusing Keywords for Fine-grained Representations and Matchings

Question retrieval aims to find the semantically equivalent questions fo...
research
08/13/2021

PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval

Recently, dense passage retrieval has become a mainstream approach to fi...

Please sign up or login with your details

Forgot password? Click here to reset