Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models

09/12/2022
by   Yinqiong Cai, et al.
0

Neural ranking models (NRMs) have become one of the most important techniques in information retrieval (IR). Due to the limitation of relevance labels, the training of NRMs heavily relies on negative sampling over unlabeled data. In general machine learning scenarios, it has shown that training with hard negatives (i.e., samples that are close to positives) could lead to better performance. Surprisingly, we find opposite results from our empirical studies in IR. When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse. Based on our investigation, the superficial reason is that there are more false negatives (i.e., unlabeled positives) in the top-ranked results with a stronger retriever, which may hurt the training process; The root is the existence of pooling bias in the dataset constructing process, where annotators only judge and label very few samples selected by some basic retrievers. Therefore, in principle, we can formulate the false negative issue in training NRMs as learning from labeled datasets with pooling bias. To solve this problem, we propose a novel Coupled Estimation Technique (CET) that learns both a relevance model and a selection model simultaneously to correct the pooling bias for training NRMs. Empirical results on three retrieval benchmarks show that NRMs trained with our technique can achieve significant gains on ranking effectiveness against other baseline strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2019

An Axiomatic Approach to Regularizing Neural Ranking Models

Axiomatic information retrieval (IR) seeks a set of principle properties...
research
10/21/2022

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

Sampling proper negatives from a large document pool is vital to effecti...
research
04/16/2021

Optimizing Dense Retrieval Model Training with Hard Negatives

Ranking has always been one of the top concerns in information retrieval...
research
12/29/2020

Meta Adaptive Neural Ranking with Contrastive Synthetic Supervision

Neural Information Retrieval (Neu-IR) models have shown their effectiven...
research
01/27/2022

Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives

This paper introduces Ranking Info Noise Contrastive Estimation (RINCE),...
research
09/14/2021

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

Pre-trained model such as BERT has been proved to be an effective tool f...
research
01/30/2019

Learning Fast Matching Models from Weak Annotations

This paper proposes a novel training scheme for fast matching models in ...

Please sign up or login with your details

Forgot password? Click here to reset