Evaluating How Fine-tuning on Bimodal Data Effects Code Generation

11/15/2022
by   Gabriel Orlanski, et al.
0

Despite the increase in popularity of language models for code generation, it is still unknown how training on bimodal coding forums affects a model's code generation performance and reliability. We, therefore, collect a dataset of over 2.2M StackOverflow questions with answers for finetuning. These fine-tuned models have average pass@k improvements of 54.64 (Chen et al., 2021) and Mostly Basic Program Problems (Austin et al., 2021) tasks, respectively. This regime further decreases the number of generated programs with both syntax and runtime errors. However, we find that at higher temperatures, there are significant decreases to the model's ability to generate runnable programs despite higher pass@k scores, underscoring the need for better methods of incorporating such data that mitigate these side effects. The code can be found https://github.com/gabeorlanski/bimodalcode-generation

READ FULL TEXT
research
06/08/2020

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Fine-tuning pre-trained transformer-based language models such as BERT h...
research
12/13/2022

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

Automating hardware design could obviate a significant amount of human e...
research
05/20/2021

Measuring Coding Challenge Competence With APPS

While programming is one of the most broadly applicable skills in modern...
research
10/28/2021

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

We present the winning entry to the Multilingual Lexical Normalization (...
research
08/21/2023

Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning

Numerical reasoning is vital for natural language processing models to u...
research
08/16/2021

Program Synthesis with Large Language Models

This paper explores the limits of the current generation of large langua...
research
08/19/2023

Inductive-bias Learning: Generating Code Models with Large Language Model

Large Language Models(LLMs) have been attracting attention due to a abil...

Please sign up or login with your details

Forgot password? Click here to reset