Building models that can be rapidly adapted to numerous tasks using only...
We investigate the optimal model size and number of tokens for training ...
The performance of a language model has been shown to be effectively mod...
We enhance auto-regressive language models by conditioning on document c...