Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

by   Mayee F. Chen, et al.

The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6 We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens.


page 3

page 19

page 20

page 21

page 22

page 23

page 24


One Model, Multiple Tasks: Pathways for Natural Language Understanding

This paper presents a Pathways approach to handle many tasks at once. Ou...

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Evaluation of Large Language Models (LLMs) is challenging because aligni...

Finding Skill Neurons in Pre-trained Transformer-based Language Models

Transformer-based pre-trained language models have demonstrated superior...

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Pre-training robot policies with a rich set of skills can substantially ...

A Theory for Emergence of Complex Skills in Language Models

A major driver of AI products today is the fact that new skills emerge i...

On the Relationship between Skill Neurons and Robustness in Prompt Tuning

Prompt Tuning is a popular parameter-efficient finetuning method for pre...

Voyager: An Open-Ended Embodied Agent with Large Language Models

We introduce Voyager, the first LLM-powered embodied lifelong learning a...

Please sign up or login with your details

Forgot password? Click here to reset