OctoPack: Instruction Tuning Code Large Language Models

08/14/2023
by   Niklas Muennighoff, et al.
1

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2 pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.

READ FULL TEXT

page 1

page 3

page 4

research
08/31/2023

Can Programming Languages Boost Each Other via Instruction Tuning?

When human programmers have mastered a programming language, it would be...
research
06/26/2023

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

Humans write code in a fundamentally interactive manner and rely on cons...
research
03/14/2022

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

Providing natural language instructions in prompts is a useful new parad...
research
04/26/2023

Exploring the Curious Case of Code Prompts

Recent work has shown that prompting language models with code-like repr...
research
08/24/2023

Code Llama: Open Foundation Models for Code

We release Code Llama, a family of large language models for code based ...
research
08/03/2023

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

In this work, we make the first attempt to evaluate LLMs in a more chall...
research
05/15/2023

Symbol tuning improves in-context learning in language models

We present symbol tuning - finetuning language models on in-context inpu...

Please sign up or login with your details

Forgot password? Click here to reset