Making root cause analysis feasible for large code bases: a solution approach for a climate model

10/31/2018
by   Daniel J. Milroy, et al.
0

Applications that simulate complex physical processes can be composed of millions of lines of code. For such complex model codes, finding the source of a bug or discrepancy in output (e.g., from a previous version, alternative hardware or supporting software stack) is non-trivial at best. Although there are many tools for program comprehension through debugging or slicing, few (if any) scale to a complicated model such as the Community Earth System Model (CESM^TM). Therefore, to enable developers to trace a problem detected in the model output to its source, we devise a series of techniques to reduce the search space to a tractable size. The first step determines which CESM output variables are most affected by a given bug. To find where these output variables are computed in the code, we construct a directed graph of internal variable relationships. We introduce a form of hybrid program slicing which integrates static, backward slicing with code coverage information to extract a subgraph consisting of relevant variable relationships. Finally, we partition the subgraph into communities, and identify nodes in each community that are central to information flow. After reducing the search space and ranking nodes by centrality, runtime variable sampling becomes feasible. We use examples of simulation output from CESM to illustrate how sampling can be performed as part of an efficient iterative refinement procedure to locate error sources. This process is also effective in complex scenarios such as sensitivity to CPU instructions.

READ FULL TEXT

page 3

page 4

page 9

page 16

page 17

page 19

page 21

research
12/13/2022

Fonte: Finding Bug Inducing Commits from Failures

A Bug Inducing Commit (BIC) is a commit that introduces a software bug i...
research
11/09/2020

First Infrastructure and Experimentation in Echo-debugging

As applications get developed, bugs inevitably get introduced. Often, it...
research
12/24/2020

Spectral Ranking of Causal Influence in Complex Systems

Like natural complex systems such as the Earth's climate or a living cel...
research
01/14/2022

DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation

The task of finding the best developer to fix a bug is called bug triage...
research
10/20/2020

Industry-scale IR-based Bug Localization: A Perspective from Facebook

We explore the application of Information Retrieval (IR) based bug local...
research
09/03/2020

ScalAna: Automating Scaling Loss Detection with Graph Analysis

Scaling a parallel program to modern supercomputers is challenging due t...

Please sign up or login with your details

Forgot password? Click here to reset