[Paper] KELP: Robust Online Log Parsing Through Evolutionary Grouping Trees
Source: arXiv - 2601.00633v1
Overview
The paper presents KELP (Kelp Evolutionary Log Parser), a high‑throughput online log‑parsing system that stays accurate even as log formats drift in production. By replacing static template models with a continuously evolving “Evolutionary Grouping Tree,” KELP can automatically discover, split, and merge log templates on the fly, dramatically reducing the brittleness that plagues existing parsers.
Key Contributions
- Evolutionary Grouping Tree (EGT) – a novel data structure that treats log‑template discovery as an online clustering problem, allowing nodes to evolve as new log lines arrive.
- Robust online parsing algorithm – KELP updates its tree in real time, handling schema drifts without manual re‑training or rule changes.
- Realistic benchmark dataset – the authors construct a new evaluation suite that captures structural ambiguity and frequent format changes seen in modern production systems, addressing shortcomings of existing static, regex‑based benchmarks.
- Empirical validation – experiments show KELP retains > 95 % parsing accuracy on the new benchmark while processing millions of log lines per second, outperforming state‑of‑the‑art heuristic parsers.
- Open‑source release – full implementation and benchmark data are publicly available (codeberg.org/stonebucklabs/kelp), enabling reproducibility and community extensions.
Methodology
- Streaming Ingestion – Each incoming log line is tokenized and fed into the EGT.
- Tree Navigation – The line traverses the tree based on token similarity scores; leaf nodes represent current candidate templates.
- Evolution Operations
- Split: if a leaf node’s internal variance exceeds a threshold, it is split into more specific child nodes.
- Merge: rarely used nodes that become similar over time are merged to avoid fragmentation.
- Re‑evaluation: node frequencies are continuously updated, allowing the tree to adapt to shifting log‑type distributions.
- Template Extraction – When a leaf node stabilizes (low variance, high frequency), its pattern is emitted as a concrete log template for downstream analytics.
- Benchmark Construction – The authors replay logs from a large‑scale microservice deployment, deliberately injecting schema changes (new fields, reordered tokens, optional sections) to mimic real‑world drift. Ground truth is generated by a semi‑automated labeling pipeline rather than static regexes.
All operations are O(log N) in the number of active templates, making the approach lightweight enough for high‑throughput pipelines.
Results & Findings
| Metric | KELP | Best Heuristic Baseline |
|---|---|---|
| Parsing Accuracy (on new benchmark) | 95.3 % | 78.1 % |
| Throughput (lines/sec) | 3.2 M | 2.9 M |
| Latency (99th‑pct) | 1.8 ms | 2.4 ms |
| Recovery time after schema drift (s) | < 5 | > 30 |
- Accuracy stays high even when log formats change every few minutes, whereas static parsers quickly degrade.
- Throughput remains comparable to existing parsers, confirming that the extra tree‑maintenance work does not become a bottleneck.
- Adaptation speed: KELP automatically re‑clusters within seconds of a drift, eliminating the need for manual template updates.
Practical Implications
- Reduced Ops toil – Teams no longer need to maintain fragile regex libraries or schedule periodic re‑training of parsers.
- More reliable alerting – Since parsing stays accurate during rollouts or feature flags, downstream anomaly‑detection and alert pipelines generate fewer false negatives.
- Scalable observability stacks – KELP’s low latency and high throughput make it a drop‑in replacement for log shippers (e.g., Fluent Bit, Logstash) in high‑volume environments.
- Edge deployment – The algorithm’s modest memory footprint (tree size proportional to active templates) enables on‑device parsing for IoT gateways or edge services.
- Foundation for downstream ML – Consistently clean, template‑labeled logs improve the quality of downstream models for root‑cause analysis, capacity planning, and security monitoring.
Limitations & Future Work
- Memory growth in extremely heterogeneous fleets – If the number of distinct log templates explodes, the tree can become large; the authors suggest pruning strategies as future work.
- Handling multi‑line log events – KELP currently assumes one‑line events; extending the EGT to group multi‑line stacks (e.g., stack traces) is left for later research.
- Parameter sensitivity – Split/merge thresholds need modest tuning for different workloads; an adaptive self‑tuning mechanism is a planned enhancement.
- Benchmark generality – While the new dataset is more realistic than prior static benchmarks, it still originates from a single organization’s stack; broader cross‑industry validation would strengthen claims.
Authors
- Satyam Singh
- Sai Niranjan Ramachandran
Paper Information
- arXiv ID: 2601.00633v1
- Categories: cs.DB, cs.SE
- Published: January 2, 2026
- PDF: Download PDF