[Paper] KELP: Robust Online Log Parsing Through Evolutionary Grouping Trees

Published: 1 month ago (January 2, 2026 at 05:27 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.00633v1

Overview

The paper presents KELP (Kelp Evolutionary Log Parser), a high‑throughput online log‑parsing system that stays accurate even as log formats drift in production. By replacing static template models with a continuously evolving “Evolutionary Grouping Tree,” KELP can automatically discover, split, and merge log templates on the fly, dramatically reducing the brittleness that plagues existing parsers.

Key Contributions

Evolutionary Grouping Tree (EGT) – a novel data structure that treats log‑template discovery as an online clustering problem, allowing nodes to evolve as new log lines arrive.
Robust online parsing algorithm – KELP updates its tree in real time, handling schema drifts without manual re‑training or rule changes.
Realistic benchmark dataset – the authors construct a new evaluation suite that captures structural ambiguity and frequent format changes seen in modern production systems, addressing shortcomings of existing static, regex‑based benchmarks.
Empirical validation – experiments show KELP retains > 95 % parsing accuracy on the new benchmark while processing millions of log lines per second, outperforming state‑of‑the‑art heuristic parsers.
Open‑source release – full implementation and benchmark data are publicly available (codeberg.org/stonebucklabs/kelp), enabling reproducibility and community extensions.

Methodology

Streaming Ingestion – Each incoming log line is tokenized and fed into the EGT.
Tree Navigation – The line traverses the tree based on token similarity scores; leaf nodes represent current candidate templates.
Evolution Operations
- Split: if a leaf node’s internal variance exceeds a threshold, it is split into more specific child nodes.
- Merge: rarely used nodes that become similar over time are merged to avoid fragmentation.
- Re‑evaluation: node frequencies are continuously updated, allowing the tree to adapt to shifting log‑type distributions.
Template Extraction – When a leaf node stabilizes (low variance, high frequency), its pattern is emitted as a concrete log template for downstream analytics.
Benchmark Construction – The authors replay logs from a large‑scale microservice deployment, deliberately injecting schema changes (new fields, reordered tokens, optional sections) to mimic real‑world drift. Ground truth is generated by a semi‑automated labeling pipeline rather than static regexes.

All operations are O(log N) in the number of active templates, making the approach lightweight enough for high‑throughput pipelines.

Results & Findings

Metric	KELP	Best Heuristic Baseline
Parsing Accuracy (on new benchmark)	95.3 %	78.1 %
Throughput (lines/sec)	3.2 M	2.9 M
Latency (99th‑pct)	1.8 ms	2.4 ms
Recovery time after schema drift (s)	< 5	> 30

Accuracy stays high even when log formats change every few minutes, whereas static parsers quickly degrade.
Throughput remains comparable to existing parsers, confirming that the extra tree‑maintenance work does not become a bottleneck.
Adaptation speed: KELP automatically re‑clusters within seconds of a drift, eliminating the need for manual template updates.

Practical Implications

Reduced Ops toil – Teams no longer need to maintain fragile regex libraries or schedule periodic re‑training of parsers.
More reliable alerting – Since parsing stays accurate during rollouts or feature flags, downstream anomaly‑detection and alert pipelines generate fewer false negatives.
Scalable observability stacks – KELP’s low latency and high throughput make it a drop‑in replacement for log shippers (e.g., Fluent Bit, Logstash) in high‑volume environments.
Edge deployment – The algorithm’s modest memory footprint (tree size proportional to active templates) enables on‑device parsing for IoT gateways or edge services.
Foundation for downstream ML – Consistently clean, template‑labeled logs improve the quality of downstream models for root‑cause analysis, capacity planning, and security monitoring.

Limitations & Future Work

Memory growth in extremely heterogeneous fleets – If the number of distinct log templates explodes, the tree can become large; the authors suggest pruning strategies as future work.
Handling multi‑line log events – KELP currently assumes one‑line events; extending the EGT to group multi‑line stacks (e.g., stack traces) is left for later research.
Parameter sensitivity – Split/merge thresholds need modest tuning for different workloads; an adaptive self‑tuning mechanism is a planned enhancement.
Benchmark generality – While the new dataset is more realistic than prior static benchmarks, it still originates from a single organization’s stack; broader cross‑industry validation would strengthen claims.

Authors

Satyam Singh
Sai Niranjan Ramachandran

Paper Information

arXiv ID: 2601.00633v1
Categories: cs.DB, cs.SE
Published: January 2, 2026
PDF: Download PDF

[Paper] KELP: Robust Online Log Parsing Through Evolutionary Grouping Trees

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Early-Stage Prediction of Review Effort in AI-Generated Pull Requests

[Paper] SEMODS: A Validated Dataset of Open-Source Software Engineering Models

[Paper] Towards Understanding and Characterizing Vulnerabilities in Intelligent Connected Vehicles through Real-World Exploits

[Paper] STELLAR: A Search-Based Testing Framework for Large Language Model Applications