b'Dara Bahri'

research

∙ 05/25/2023

Sharpness-Aware Minimization Leads to Low-Rank Features

Sharpness-aware minimization (SAM) is a recently proposed method that mi...

1 Maksym Andriushchenko, et al. ∙

research

∙ 04/25/2022

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs usi...

3 Kai Hui, et al. ∙

research

∙ 10/16/2021

Sharpness-Aware Minimization Improves Language Model Generalization

The allure of superhuman-level capabilities has led to considerable inte...

1 Dara Bahri, et al. ∙

research

∙ 06/29/2021

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

Self-supervised contrastive representation learning has proved incredibl...

7 Dara Bahri, et al. ∙

research

∙ 06/23/2021

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

State-of-the-art models in natural language processing rely on separate ...

4 Yi Tay, et al. ∙

research

∙ 06/04/2021

Churn Reduction via Distillation

In real-world systems, models are frequently updated as more data become...

0 Heinrich Jiang, et al. ∙

research

∙ 05/07/2021

Are Pre-trained Convolutions Better than Pre-trained Transformers?

In the era of pre-trained language models, Transformers are the de facto...

11 Yi Tay, et al. ∙

research

∙ 05/05/2021

Rethinking Search: Making Experts out of Dilettantes

When experiencing an information need, users want to engage with an expe...

6 Donald Metzler, et al. ∙

research

∙ 03/01/2021

OmniNet: Omnidirectional Representations from Transformers

This paper proposes Omnidirectional Representations from Transformers (O...

19 Yi Tay, et al. ∙

research

∙ 02/09/2021

Locally Adaptive Label Smoothing for Predictive Churn

Training modern neural networks is an inherently noisy process that can ...

0 Dara Bahri, et al. ∙

research

∙ 02/09/2021

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection

Detecting out-of-distribution (OOD) examples is critical in many applica...

5 Dara Bahri, et al. ∙

research

∙ 12/01/2020

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

There are two major classes of natural language grammars – the dependenc...

2 Yikang Shen, et al. ∙

research

∙ 11/08/2020

Long Range Arena: A Benchmark for Efficient Transformers

Transformers do not scale very well to long sequence lengths largely bec...

5 Yi Tay, et al. ∙

research

∙ 10/19/2020

Surprise: Result List Truncation via Extreme Value Theory

Work in information retrieval has largely been centered around ranking a...

1 Dara Bahri, et al. ∙

research

∙ 09/14/2020

Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately du...

33 Yi Tay, et al. ∙

research

∙ 08/17/2020

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Large generative language models such as GPT-2 are well-known for their ...

0 Dara Bahri, et al. ∙

research

∙ 07/12/2020

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Achieving state-of-the-art performance on natural language understanding...

0 Yi Tay, et al. ∙

research

∙ 05/02/2020

Synthesizer: Rethinking Self-Attention in Transformer Models

The dot product self-attention is known to be central and indispensable ...

0 Yi Tay, et al. ∙

research

∙ 04/26/2020

Deep k-NN for Noisy Labels

Modern machine learning models are often trained on examples with noisy ...

0 Dara Bahri, et al. ∙

research

∙ 04/26/2020

Choppy: Cut Transformer For Ranked List Truncation

Work in information retrieval has traditionally focused on ranking and r...

2 Dara Bahri, et al. ∙

research

∙ 04/13/2020

Reverse Engineering Configurations of Neural Text Generation Models

This paper seeks to develop a deeper understanding of the fundamental pr...

0 Yi Tay, et al. ∙

research

∙ 02/26/2020

Sparse Sinkhorn Attention

We propose Sparse Sinkhorn Attention, a new efficient and sparse method ...

0 Yi Tay, et al. ∙

Dara Bahri

Featured Co-authors

Sign in with Google

Consider DeepAI Pro