DataExposer: Exposing Disconnect between Data and Systems

by   Sainyam Galhotra, et al.

As data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of the data. For example, a health-monitoring system that is designed under the assumption that weight is reported in imperial units (lbs) will malfunction when encountering weight reported in metric units (kilograms). Similar to software debugging, which aims to find bugs in the mechanism (source code or runtime conditions), our goal is to debug the data to identify potential sources of disconnect between the assumptions about the data and the systems that operate on that data. Specifically, we seek which properties of the data cause a data-driven system to malfunction. We propose DataExposer, a framework to identify data properties, called profiles, that are the root causes of performance degradation or failure of a system that operates on the data. Such identification is necessary to repair the system and resolve the disconnect between data and system. Our technique is based on causal reasoning through interventions: when a system malfunctions for a dataset, DataExposer alters the data profiles and observes changes in the system's behavior due to the alteration. Unlike statistical observational analysis that reports mere correlations, DataExposer reports causally verified root causes, in terms of data profiles, of the system malfunction. We empirically evaluate DataExposer on three real-world and several synthetic data-driven systems that fail on datasets due to a diverse set of reasons. In all cases, DataExposer identifies the root causes precisely while requiring orders of magnitude fewer interventions than prior techniques.


Causality-Guided Adaptive Interventional Debugging

Runtime nondeterminism is a fact of life in modern database applications...

Scalable Statistical Root Cause Analysis on App Telemetry

Despite engineering workflows that aim to prevent buggy code from being ...

Causal Testing: Finding Defects' Root Causes

Isolating and repairing unexpected or buggy software behavior typically ...

Incremental Causal Graph Learning for Online Unsupervised Root Cause Analysis

The task of root cause analysis (RCA) is to identify the root causes of ...

ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)

We present ExplainIt!, a declarative, unsupervised root-cause analysis e...

A Pipeline for Business Intelligence and Data-Driven Root Cause Analysis on Categorical Data

Business intelligence (BI) is any knowledge derived from existing data t...

An Effective Data-Driven Approach for Localizing Deep Learning Faults

Deep Learning (DL) applications are being used to solve problems in crit...

Please sign up or login with your details

Forgot password? Click here to reset