Generation-Augmented Retrieval for Open-domain Question Answering

by   Yuning Mao, et al.

Conventional sparse retrieval methods such as TF-IDF and BM25 are simple and efficient, but solely rely on lexical overlap without semantic matching. Recent dense retrieval methods learn latent representations to tackle the lexical mismatch problem, while being more computationally expensive and insufficient for exact matching as they embed the text sequence into a single vector with limited capacity. In this paper, we present Generation-Augmented Retrieval (GAR), a query expansion method that augments a query with relevant contexts through text generation. We demonstrate on open-domain question answering that the generated contexts significantly enrich the semantics of the queries and thus GAR with sparse representations (BM25) achieves comparable or better performance than the state-of-the-art dense methods such as DPR <cit.>. We show that generating various contexts of a query is beneficial as fusing their results consistently yields better retrieval accuracy. Moreover, as sparse and dense representations are often complementary, GAR can be easily combined with DPR to achieve even better performance. Furthermore, GAR achieves the state-of-the-art performance on the Natural Questions and TriviaQA datasets under the extractive setting when equipped with an extractive reader, and consistently outperforms other retrieval methods when the same generative reader is used.


page 1

page 2

page 3

page 4


SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

We introduce SPARTA, a novel neural retrieval method that shows great pr...

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

We propose EAR, a query Expansion And Reranking approach for improving p...

Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation

Dense retrievers have made significant strides in obtaining state-of-the...

An Unsupervised Model with Attention Autoencoders for Question Retrieval

Question retrieval is a crucial subtask for community question answering...

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Despite their recent popularity and well known advantages, dense retriev...

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

We propose a simple and efficient multi-hop dense retrieval approach for...

Passage Retrieval for Outside-Knowledge Visual Question Answering

In this work, we address multi-modal information needs that contain text...

Please sign up or login with your details

Forgot password? Click here to reset