BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes

by   Naji Dmeiri, et al.

Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like Travis-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BugSwarm, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.


EMTk - The Emotion Mining Toolkit

The Emotion Mining Toolkit (EMTk) is a suite of modules and datasets off...

Variability Fault Localization: A Benchmark

Software fault localization is one of the most expensive, tedious, and t...

DockerMock: Pre-Build Detection of Dockerfile Faults through Mocking Instruction Execution

Continuous Integration (CI) and Continuous Deployment (CD) are widely ad...

Critical Review of BugSwarm for Fault Localization and Program Repair

Benchmarks play an important role in evaluating the efficiency and effec...

The Importance of Discerning Flaky from Fault-triggering Test Failures: A Case Study on the Chromium CI

Flaky tests are tests that pass and fail on different executions of the ...

Comparing Mutation Coverage Against Branch Coverage in an Industrial Setting

The state-of-the-practice in software development is driven by constant ...

Please sign up or login with your details

Forgot password? Click here to reset