Using Abduction in Markov Logic Networks for Root Cause Analysis

11/18/2015
by   Joerg Schoenfisch, et al.
0

IT infrastructure is a crucial part in most of today's business operations. High availability and reliability, and short response times to outages are essential. Thus a high amount of tool support and automation in risk management is desirable to decrease outages. We propose a new approach for calculating the root cause for an observed failure in an IT infrastructure. Our approach is based on Abduction in Markov Logic Networks. Abduction aims to find an explanation for a given observation in the light of some background knowledge. In failure diagnosis, the explanation corresponds to the root cause, the observation to the failure of a component, and the background knowledge to the dependency graph extended by potential risks. We apply a method to extend a Markov Logic Network in order to conduct abductive reasoning, which is not naturally supported in this formalism. Our approach exhibits a high amount of reusability and enables users without specific knowledge of a concrete infrastructure to gain viable insights in the case of an incident. We implemented the method in a tool and illustrate its suitability for root cause analysis by applying it to a sample scenario.

READ FULL TEXT
research
05/07/2021

An Influence-based Approach for Root Cause Alarm Discovery in Telecom Networks

Alarm root cause analysis is a significant component in the day-to-day t...
research
03/21/2020

Causality-Guided Adaptive Interventional Debugging

Runtime nondeterminism is a fact of life in modern database applications...
research
08/28/2023

Infomathic

Since its existence, the computer tool has often supported mathematician...
research
06/05/2020

Root Cause Analysis in Lithium-Ion Battery Production with FMEA-Based Large-Scale Bayesian Network

The production of lithium-ion battery cells is characterized by a high d...
research
04/26/2015

Monitoring Extreme-scale Lustre Toolkit

We discuss the design and ongoing development of the Monitoring Extreme-...
research
01/31/2023

DNN Explanation for Safety Analysis: an Empirical Evaluation of Clustering-based Approaches

The adoption of deep neural networks (DNNs) in safety-critical contexts ...
research
10/16/2016

Fault Detection Engine in Intelligent Predictive Analytics Platform for DCIM

With the advancement of huge data generation and data handling capabilit...

Please sign up or login with your details

Forgot password? Click here to reset