Megadiff: A Dataset of 600k Java Source Code Changes Categorized by Diff Size

08/10/2021
by   Martin Monperrus, et al.
0

This paper presents Megadiff, a dataset of source code diffs. It focuses on Java, with strict inclusion criteria based on commit message and diff size. Megadiff contains 663 029 Java diffs that can be used for research on commit comprehension, fault localization, automated program repair, and machine learning on code changes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2021

FLACOCO: Fault Localization for Java based on Industry-grade Coverage

Fault localization is an essential step in the debugging process. Spectr...
research
12/18/2022

JEMMA: An Extensible Java Dataset for ML4Code Applications

Machine Learning for Source Code (ML4Code) is an active research field i...
research
03/19/2023

Towards a Dataset of Programming Contest Plagiarism in Java

In this paper, we describe and present the first dataset of source code ...
research
11/09/2020

Pointing to Subwords for Generating Function Names in Source Code

We tackle the task of automatically generating a function name from sour...
research
03/26/2018

Source Code Optimization using Equivalent Mutants

A mutant is a program obtained by syntactically modifying a program's so...
research
07/22/2019

Learning the Relation between Code Features and Code Transforms with Structured Prediction

We present in this paper the first approach for structurally predicting ...
research
12/20/2017

Kayak: Safe Semantic Refactoring to Java Streams

Refactorings are structured changes to existing software that leave its ...

Please sign up or login with your details

Forgot password? Click here to reset