Pretraining on the Test Set Is All You Need

09/13/2023
by   Rylan Schaeffer, et al.
0

Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM phi-CTNL (pronounced “fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. phi-CTNL also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.

READ FULL TEXT
research
09/13/2023

EarthPT: a foundation model for Earth Observation

We introduce EarthPT – an Earth Observation (EO) pretrained transformer....
research
09/20/2023

The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

The Languini Kitchen serves as both a research collective and codebase d...
research
09/06/2020

Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

We use pretrained transformer-based language models in SemEval-2020 Task...
research
06/01/2023

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Large language models are commonly trained on a mixture of filtered web ...
research
06/27/2023

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Genomic (DNA) sequences encode an enormous amount of information for gen...
research
09/19/2023

A Family of Pretrained Transformer Language Models for Russian

Nowadays, Transformer language models (LMs) represent a fundamental comp...
research
12/21/2022

Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models

Human linguistic capacity is often characterized by compositionality and...

Please sign up or login with your details

Forgot password? Click here to reset