Data-Efficient Autoregressive Document Retrieval for Fact Verification

11/17/2022
by   James Thorne, et al.
0

Document retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification and question answering. Sources of textual knowledge, such as Wikipedia articles, condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia page given a query. However, this method requires supervision in the form of human annotation to label which Wikipedia pages contain appropriate context. This paper introduces a distant-supervision method that does not require any annotation to train autoregressive retrievers that attain competitive R-Precision and Recall in a zero-shot setting. Furthermore we show that with task-specific supervised fine-tuning, autoregressive retrieval performance for two Wikipedia-based fact verification tasks can approach or even exceed full supervision using less than 1/4 of the annotated data indicating possible directions for data-efficient autoregressive retrieval.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2017

Reading Wikipedia to Answer Open-Domain Questions

This paper proposes to tackle open- domain question answering using Wiki...
research
05/19/2023

Evaluation of medium-large Language Models at zero-shot closed book generative question answering

Large language models (LLMs) have garnered significant attention, but th...
research
03/09/2023

Can a Frozen Pretrained Language Model be used for Zero-shot Neural Retrieval on Entity-centric Questions?

Neural document retrievers, including dense passage retrieval (DPR), hav...
research
12/16/2021

Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Information retrieval is an important component in natural language proc...
research
05/04/2023

Chain-of-Skills: A Configurable Model for Open-domain Question Answering

The retrieval model is an indispensable component for real-world knowled...
research
04/25/2022

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Pretrained language models have improved effectiveness on numerous tasks...
research
04/11/2021

Fast Linking of Mathematical Wikidata Entities in Wikipedia Articles Using Annotation Recommendation

Mathematical information retrieval (MathIR) applications such as semanti...

Please sign up or login with your details

Forgot password? Click here to reset