Practical data monitoring in the internet-services domain

03/15/2022
by   Nikhil Galagali, et al.
0

Large-scale monitoring, anomaly detection, and root cause analysis of metrics are essential requirements of the internet-services industry. To address the need to continuously monitor millions of metrics, many anomaly detection approaches are being used on a daily basis by large internet-based companies. However, in spite of the significant progress made to accurately and efficiently detect anomalies in metrics, the sheer scale of the number of metrics has meant there are still a large number of false alarms that need to be investigated. This paper presents a framework for reliable large-scale anomaly detection. It is significantly more accurate than existing approaches and allows for easy interpretation of models, thus enabling practical data monitoring in the internet-services domain.

READ FULL TEXT
research
08/19/2023

Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services

As modern software systems continue to grow in terms of complexity and v...
research
09/04/2023

Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems

Real-world production systems often grapple with maintaining data qualit...
research
09/14/2022

Analytics and Machine Learning Powered Wireless Network Optimization and Planning

It is important that the wireless network is well optimized and planned,...
research
02/12/2018

Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

To ensure undisrupted business, large Internet companies need to closely...
research
02/23/2016

Finding Needle in a Million Metrics: Anomaly Detection in a Large-scale Computational Advertising Platform

Online media offers opportunities to marketers to deliver brand messages...
research
10/26/2022

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

Anomaly detection to recognize unusual events in large scale systems in ...
research
06/23/2020

Lumos: A Library for Diagnosing Metric Regressions in Web-Scale Applications

Web-scale applications can ship code on a daily to weekly cadence. These...

Please sign up or login with your details

Forgot password? Click here to reset