Sharpness-aware minimization (SAM) is a recently proposed method that
mi...
State-of-the-art neural models typically encode document-query pairs usi...
The allure of superhuman-level capabilities has led to considerable inte...
Self-supervised contrastive representation learning has proved incredibl...
State-of-the-art models in natural language processing rely on separate ...
In real-world systems, models are frequently updated as more data become...
In the era of pre-trained language models, Transformers are the de facto...
When experiencing an information need, users want to engage with an expe...
This paper proposes Omnidirectional Representations from Transformers
(O...
Training modern neural networks is an inherently noisy process that can ...
Detecting out-of-distribution (OOD) examples is critical in many
applica...
There are two major classes of natural language grammars – the dependenc...
Transformers do not scale very well to long sequence lengths largely bec...
Work in information retrieval has largely been centered around ranking a...
Transformer model architectures have garnered immense interest lately du...
Large generative language models such as GPT-2 are well-known for their
...
Achieving state-of-the-art performance on natural language understanding...
The dot product self-attention is known to be central and indispensable ...
Modern machine learning models are often trained on examples with noisy
...
Work in information retrieval has traditionally focused on ranking and
r...
This paper seeks to develop a deeper understanding of the fundamental
pr...
We propose Sparse Sinkhorn Attention, a new efficient and sparse method ...