InstructEval: Systematic Evaluation of Instruction Selection Methods

by   Anirudh Ajith, et al.

In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that the precise details of the inputs used in the prompt significantly impacts ICL, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses being restricted to shallow subsets of models and tasks, which limits the generalizability of their insights. We develop an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from 4 distinct model families and covers 9 different tasks, representing a range of task types across 3 categories. In this work, we evaluate the relative performance of 7 popular instruction selection methods using our benchmark over five desiderata relevant to ICL. We discover that using curated manually-written instructions and simple instructions without any task-specific descriptions often elicits superior ICL performance than that of automatic instruction-induction methods, pointing to a lack of generalizability among the latter. We release our evaluation suite for benchmarking instruction selection approaches, and call for more rigorous and generalizable methods in this space.


page 1

page 2

page 3

page 4


Instruction Induction: From Few Examples to Natural Language Task Descriptions

Large language models are able to perform a task by conditioning on a fe...

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Recent work has shown that fine-tuning large pre-trained language models...

How Many Data Samples is an Additional Instruction Worth?

Recently introduced instruction-paradigm empowers non-expert users to le...

The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V

This report makes the case that a well-designed Reduced Instruction Set ...

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Large language models (LLMs) with instruction finetuning demonstrate sup...

Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning

Task semantics can be expressed by a set of input-to-output examples or ...

Further Decimating the Inductive Programming Search Space with Instruction Digrams

Overlapping instruction subsets derived from human originated code have ...

Please sign up or login with your details

Forgot password? Click here to reset