CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval

04/05/2023
by   Xing Wu, et al.
0

Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the reconstruction of passages. However, it only uses a single auto-encoding pre-task for dense representation pre-training. This study brings multi-view modeling to the contextual masked auto-encoder. Firstly, multi-view representation utilizes both dense and sparse vectors as multi-view representations, aiming to capture sentence semantics from different aspects. Moreover, multiview decoding paradigm utilizes both autoencoding and auto-regressive decoders in representation bottleneck pre-training, aiming to provide both reconstructive and generative signals for better contextual representation pretraining. We refer to this multi-view pretraining method as CoT-MAE v2. Through extensive experiments, we show that CoT-MAE v2 is effective and robust on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2022

Auto-Encoder based Co-Training Multi-View Representation Learning

Multi-view learning is a learning problem that utilizes the various repr...
research
04/20/2023

CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with Mixture-of-Textual-Experts for Passage Retrieval

Passage retrieval aims to retrieve relevant passages from large collecti...
research
06/07/2023

ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems

Dialogue response selection aims to select an appropriate response from ...
research
06/08/2022

Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

For unsupervised pretraining, mask-reconstruction pretraining (MRP) appr...
research
05/24/2022

RetroMAE: Pre-training Retrieval-oriented Transformers via Masked Auto-Encoder

Pre-trained models have demonstrated superior power on many important ta...
research
11/16/2022

RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support retrieval applications such as web search and question...
research
12/15/2022

MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are Better Dense Retrievers

Dense retrieval aims to map queries and passages into low-dimensional ve...

Please sign up or login with your details

Forgot password? Click here to reset