A Large Scale Search Dataset for Unbiased Learning to Rank

by   Lixin Zou, et al.

The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms. However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to the following disadvantages observed from those popular benchmark datasets: (1) outdated semantic feature extraction where state-of-the-art large scale pre-trained language models like BERT cannot be exploited due to the missing of the original text;(2) incomplete display features for in-depth study of ULTR, e.g., missing the displayed abstract of documents for analyzing the click necessary bias; (3) lacking real-world user feedback, leading to the prevalence of synthetic datasets in the empirical study. To overcome the above disadvantages, we introduce the Baidu-ULTR dataset. It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries, which is orders of magnitude larger than the existing ones. Baidu-ULTR provides:(1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract, enabling the comprehensive study of different biases with advanced techniques such as causal discovery and meta-learning; and (3) rich user feedback on search result pages (SERPs) like dwelling time, allowing for user engagement optimization and promoting the exploration of multi-task learning in ULTR. In this paper, we present the design principle of Baidu-ULTR and the performance of benchmark ULTR algorithms on this new data resource, favoring the exploration of ranking for long-tail queries and pre-training tasks for ranking. The Baidu-ULTR dataset and corresponding baseline implementation are available at https://github.com/ChuXiaokai/baidu_ultr_dataset.


Groupwise Query Performance Prediction with BERT

While large-scale pre-trained language models like BERT have advanced th...

IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models

We introduce a new open information extraction (OIE) benchmark for pre-t...

Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search

An effective ranking model usually requires a large amount of training d...

Whole Page Unbiased Learning to Rank

The page presentation biases in the information retrieval system, especi...

Reinforcement Guided Multi-Task Learning Framework for Low-Resource Stereotype Detection

As large Pre-trained Language Models (PLMs) trained on large amounts of ...

Pretraining De-Biased Language Model with Large-scale Click Logs for Document Ranking

Pre-trained language models have achieved great success in various large...

Effectiveness of Debiasing Techniques: An Indigenous Qualitative Analysis

An indigenous perspective on the effectiveness of debiasing techniques f...

Please sign up or login with your details

Forgot password? Click here to reset