LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data

by   Xiaoyun Li, et al.

Log data is a crucial resource for recording system events and states during system execution. However, as systems grow in scale, log data generation has become increasingly explosive, leading to an expensive overhead on log storage, such as several petabytes per day in production. To address this issue, log compression has become a crucial task in reducing disk storage while allowing for further log analysis. Unfortunately, existing general-purpose and log-specific compression methods have been limited in their ability to utilize log data characteristics. To overcome these limitations, we conduct an empirical study and obtain three major observations on the characteristics of log data that can facilitate the log compression task. Based on these observations, we propose LogShrink, a novel and effective log compression method by leveraging commonality and variability of log data. An analyzer based on longest common subsequence and entropy techniques is proposed to identify the latent commonality and variability in log messages. The key idea behind this is that the commonality and variability can be exploited to shrink log data with a shorter representation. Besides, a clustering-based sequence sampler is introduced to accelerate the commonality and variability analyzer. The extensive experimental results demonstrate that LogShrink can exceed baselines in compression ratio by 16 reasonable compression speed.


page 1

page 2

page 3

page 4


Log Parsing with Prompt-based Few-shot Learning

Logs generated by large-scale software systems provide crucial informati...

Log(Graph): A Near-Optimal High-Performance Graph Representation

Today's graphs used in domains such as machine learning or social networ...

Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression

System logs record detailed runtime information of software systems and ...

Polynomial data compression for large-scale physics experiments

The new generation research experiments will introduce huge data surge t...

AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection

The rapid progress of modern computing systems has led to a growing inte...

SCI: A spectrum concentrated implicit neural compression for biomedical data

Massive collection and explosive growth of the huge amount of medical da...

Query Log Compression for Workload Analytics

Analyzing database access logs is a key part of performance tuning, intr...

Please sign up or login with your details

Forgot password? Click here to reset