Characterizing and Mitigating Anti-patterns of Alerts in Industrial Cloud Systems

by   Tianyi Yang, et al.

Alerts are crucial for requesting prompt human intervention upon cloud anomalies. The quality of alerts significantly affects the cloud reliability and the cloud provider's business revenue. In practice, we observe on-call engineers being hindered from quickly locating and fixing faulty cloud services because of the vast existence of misleading, non-informative, non-actionable alerts. We call the ineffectiveness of alerts "anti-patterns of alerts". To better understand the anti-patterns of alerts and provide actionable measures to mitigate anti-patterns, in this paper, we conduct the first empirical study on the practices of mitigating anti-patterns of alerts in an industrial cloud system. We study the alert strategies and the alert processing procedure at Huawei Cloud, a leading cloud provider. Our study combines the quantitative analysis of millions of alerts in two years and a survey with eighteen experienced engineers. As a result, we summarized four individual anti-patterns and two collective anti-patterns of alerts. We also summarize four current reactions to mitigate the anti-patterns of alerts, and the general preventative guidelines for the configuration of alert strategy. Lastly, we propose to explore the automatic evaluation of the Quality of Alerts (QoA), including the indicativeness, precision, and handleability of alerts, as a future research direction that assists in the automatic detection of alerts' anti-patterns. The findings of our study are valuable for optimizing cloud monitoring systems and improving the reliability of cloud services.


page 1

page 5


Investigating Design Anti-pattern and Design Pattern Mutations and Their Change- and Fault-proneness

During software evolution, inexperienced developers may introduce design...

GDPR Anti-Patterns: How Design and Operation of Modern Cloud-scale Systems Conflict with GDPR

In recent years, our society is being plagued by unprecedented levels of...

Assess and Summarize: Improve Outage Understanding with Large Language Models

Cloud systems have become increasingly popular in recent years due to th...

The 'as Code' Activities: Development Anti-patterns for Infrastructure as Code

Context: The 'as code' suffix in infrastructure as code (IaC) refers to ...

Improving Developers' Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies

Regular expressions are used for diverse purposes, including input valid...

A Comprehensive Theory and Variational Framework for Anti-aliasing Sampling Patterns

In this paper, we provide a comprehensive theory of anti-aliasing sampli...

Anti-Counterfeiting for Polymer Banknotes Based on Polymer Substrate Fingerprinting

Polymer banknotes are the trend for printed currency and have been adopt...

Please sign up or login with your details

Forgot password? Click here to reset