[Paper] KELP: Robust Online Log Parsing Through Evolutionary Grouping Trees

Published: (January 2, 2026 at 05:27 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.00633v1

Overview

The paper presents KELP (Kelp Evolutionary Log Parser), a high‑throughput online log‑parsing system that stays accurate even as log formats drift in production. By replacing static template models with a continuously evolving “Evolutionary Grouping Tree,” KELP can automatically discover, split, and merge log templates on the fly, dramatically reducing the brittleness that plagues existing parsers.

Key Contributions

  • Evolutionary Grouping Tree (EGT) – a novel data structure that treats log‑template discovery as an online clustering problem, allowing nodes to evolve as new log lines arrive.
  • Robust online parsing algorithm – KELP updates its tree in real time, handling schema drifts without manual re‑training or rule changes.
  • Realistic benchmark dataset – the authors construct a new evaluation suite that captures structural ambiguity and frequent format changes seen in modern production systems, addressing shortcomings of existing static, regex‑based benchmarks.
  • Empirical validation – experiments show KELP retains > 95 % parsing accuracy on the new benchmark while processing millions of log lines per second, outperforming state‑of‑the‑art heuristic parsers.
  • Open‑source release – full implementation and benchmark data are publicly available (codeberg.org/stonebucklabs/kelp), enabling reproducibility and community extensions.

Methodology

  1. Streaming Ingestion – Each incoming log line is tokenized and fed into the EGT.
  2. Tree Navigation – The line traverses the tree based on token similarity scores; leaf nodes represent current candidate templates.
  3. Evolution Operations
    • Split: if a leaf node’s internal variance exceeds a threshold, it is split into more specific child nodes.
    • Merge: rarely used nodes that become similar over time are merged to avoid fragmentation.
    • Re‑evaluation: node frequencies are continuously updated, allowing the tree to adapt to shifting log‑type distributions.
  4. Template Extraction – When a leaf node stabilizes (low variance, high frequency), its pattern is emitted as a concrete log template for downstream analytics.
  5. Benchmark Construction – The authors replay logs from a large‑scale microservice deployment, deliberately injecting schema changes (new fields, reordered tokens, optional sections) to mimic real‑world drift. Ground truth is generated by a semi‑automated labeling pipeline rather than static regexes.

All operations are O(log N) in the number of active templates, making the approach lightweight enough for high‑throughput pipelines.

Results & Findings

MetricKELPBest Heuristic Baseline
Parsing Accuracy (on new benchmark)95.3 %78.1 %
Throughput (lines/sec)3.2 M2.9 M
Latency (99th‑pct)1.8 ms2.4 ms
Recovery time after schema drift (s)< 5> 30
  • Accuracy stays high even when log formats change every few minutes, whereas static parsers quickly degrade.
  • Throughput remains comparable to existing parsers, confirming that the extra tree‑maintenance work does not become a bottleneck.
  • Adaptation speed: KELP automatically re‑clusters within seconds of a drift, eliminating the need for manual template updates.

Practical Implications

  • Reduced Ops toil – Teams no longer need to maintain fragile regex libraries or schedule periodic re‑training of parsers.
  • More reliable alerting – Since parsing stays accurate during rollouts or feature flags, downstream anomaly‑detection and alert pipelines generate fewer false negatives.
  • Scalable observability stacks – KELP’s low latency and high throughput make it a drop‑in replacement for log shippers (e.g., Fluent Bit, Logstash) in high‑volume environments.
  • Edge deployment – The algorithm’s modest memory footprint (tree size proportional to active templates) enables on‑device parsing for IoT gateways or edge services.
  • Foundation for downstream ML – Consistently clean, template‑labeled logs improve the quality of downstream models for root‑cause analysis, capacity planning, and security monitoring.

Limitations & Future Work

  • Memory growth in extremely heterogeneous fleets – If the number of distinct log templates explodes, the tree can become large; the authors suggest pruning strategies as future work.
  • Handling multi‑line log events – KELP currently assumes one‑line events; extending the EGT to group multi‑line stacks (e.g., stack traces) is left for later research.
  • Parameter sensitivity – Split/merge thresholds need modest tuning for different workloads; an adaptive self‑tuning mechanism is a planned enhancement.
  • Benchmark generality – While the new dataset is more realistic than prior static benchmarks, it still originates from a single organization’s stack; broader cross‑industry validation would strengthen claims.

Authors

  • Satyam Singh
  • Sai Niranjan Ramachandran

Paper Information

  • arXiv ID: 2601.00633v1
  • Categories: cs.DB, cs.SE
  • Published: January 2, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »