Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes

07/01/2022
by   Cedric Richter, et al.
0

Real bug fixes found in open source repositories seem to be the perfect source for learning to localize and repair real bugs. However, the absence of large scale bug fix collections has made it difficult to effectively exploit real bug fixes in the training of larger neural models in the past. In contrast, artificial bugs – produced by mutating existing source code – can be easily obtained at a sufficient scale and are therefore often preferred in the training of existing approaches. Still, localization and repair models that are trained on artificial bugs usually underperform when faced with real bugs. This raises the question whether bug localization and repair models trained on real bug fixes are more effective in localizing and repairing real bugs. We address this question by introducing RealiT, a pre-train-and-fine-tune approach for effectively learning to localize and repair real bugs from real bug fixes. RealiT is first pre-trained on a large number of artificial bugs produced by traditional mutation operators and then fine-tuned on a smaller set of real bug fixes. Fine-tuning does not require any modifications of the learning algorithm and hence can be easily adopted in various training scenarios for bug localization or repair (even when real training data is scarce). In addition, we found that training on real bug fixes with RealiT is empirically powerful by nearly doubling the localization performance of an existing model on real bugs while maintaining or even improving the repair performance.

READ FULL TEXT
research
01/28/2022

TSSB-3M: Mining single statement bugs at massive scale

Single statement bugs are one of the most important ingredients in the e...
research
07/19/2019

On Usefulness of the Deep-Learning-Based Bug Localization Models to Practitioners

Background: Developers spend a significant amount of time and efforts to...
research
07/14/2021

DeepMutants: Training neural bug detectors with contextual mutations

Learning-based bug detectors promise to find bugs in large code bases by...
research
07/21/2022

BigIssue: A Realistic Bug Localization Benchmark

As machine learning tools progress, the inevitable question arises: How ...
research
12/03/2021

Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?

Human developers can produce code with cybersecurity weaknesses. Can eme...
research
05/16/2021

SLGPT: Using Transfer Learning to Directly Generate Simulink Model Files and Find Bugs in the Simulink Toolchain

Finding bugs in a commercial cyber-physical system (CPS) development too...
research
12/18/2018

Learning to Generate Corrective Patches using Neural Machine Translation

Bug fixing is generally a manually-intensive task. However, recent work ...

Please sign up or login with your details

Forgot password? Click here to reset