LMentry: A Language Model Benchmark of Elementary Language Tasks

by   Avia Efrat, et al.

As the performance of large language models rapidly improves, benchmarks are getting larger and more complex as well. We present LMentry, a benchmark that avoids this "arms race" by focusing on a compact set of tasks that are trivial to humans, e.g. writing a sentence containing a specific word, identifying which words in a list belong to a specific category, or choosing which of two words is longer. LMentry is specifically designed to provide quick and interpretable insights into the capabilities and robustness of large language models. Our experiments reveal a wide variety of failure cases that, while immediately obvious to humans, pose a considerable challenge for large language models, including OpenAI's latest 175B-parameter instruction-tuned model, TextDavinci002. LMentry complements contemporary evaluation approaches of large language models, providing a quick, automatic, and easy-to-run "unit test", without resorting to large benchmark suites of complex tasks.


page 1

page 2

page 3

page 4


Collateral facilitation in humans and language models

Are the predictions of humans and language models affected by similar th...

Sparks: Inspiration for Science Writing using Language Models

Large-scale language models are rapidly improving, performing well on a ...

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Large language models have demonstrated remarkable performance across va...

COLLIE: Systematic Construction of Constrained Text Generation Tasks

Text generation under constraints have seen increasing interests in natu...

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

The explosive growth of language models and their applications have led ...

Scalable Performance Analysis for Vision-Language Models

Joint vision-language models have shown great performance over a diverse...

ClusterLLM: Large Language Models as a Guide for Text Clustering

We introduce ClusterLLM, a novel text clustering framework that leverage...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset