ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)

03/19/2019
by   Vimalkumar Jeyakumar, et al.
0

We present ExplainIt!, a declarative, unsupervised root-cause analysis engine that uses time series monitoring data from large complex systems such as data centres. ExplainIt! empowers operators to succinctly specify a large number of causal hypotheses to search for causes of interesting events. ExplainIt! then ranks these hypotheses and summarises causal dependencies between hundreds of thousands of variables for human understanding. We show how a declarative language, such as SQL, can be effective in declaratively enumerating hypotheses that probe the structure of an unknown probabilistic graphical causal model of the underlying system. Our thesis is that databases are in a unique position to enable users to rapidly explore the possible causal mechanisms in data collected from diverse sources. We empirically demonstrate how ExplainIt! had helped us resolve over 30 performance issues in a commercial product since late 2014, of which we discuss a few cases in detail.

READ FULL TEXT

page 9

page 10

research
07/28/2023

Case Studies of Causal Discovery from IT Monitoring Time Series

Information technology (IT) systems are vital for modern businesses, han...
research
01/04/2020

Root Cause Detection Among Anomalous Time Series Using Temporal State Alignment

The recent increase in the scale and complexity of software systems has ...
research
02/10/2021

Inductive Granger Causal Modeling for Multivariate Time Series

Granger causal modeling is an emerging topic that can uncover Granger ca...
research
06/03/2019

Stress Testing Network Reconstruction via Graphical Causal Model

An optimal evaluation of the resilience in financial portfolios implies ...
research
07/18/2022

PerfCE: Performance Debugging on Databases with Chaos Engineering-Enhanced Causality Analysis

Debugging performance anomalies in real-world databases is challenging. ...
research
01/18/2023

CaRE: Finding Root Causes of Configuration Issues in Highly-Configurable Robots

Robotic systems have several subsystems that possess a huge combinatoria...
research
05/13/2021

DataExposer: Exposing Disconnect between Data and Systems

As data is a central component of many modern systems, the cause of a sy...

Please sign up or login with your details

Forgot password? Click here to reset