[Paper] LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models

Published: (December 4, 2025 at 12:30 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.04474v1

Overview

The paper introduces LLM‑SrcLog, a new framework that extracts log‑message templates directly from a system’s source code using large language models (LLMs). By combining static code analysis with data‑driven parsing, the approach delivers high‑accuracy log templates while keeping the runtime overhead low enough for production use.

Key Contributions

  • Proactive template extraction: Generates log templates from source code before the software is deployed, reducing reliance on post‑hoc log mining.
  • Unified pipeline: Merges a white‑box LLM‑based extractor (code‑driven) with a black‑box data‑driven extractor to handle logs that lack accessible source code.
  • Cross‑function static analysis: Reconstructs logging contexts across functions and modules, enabling the LLM to distinguish constant text from variable placeholders.
  • Performance gains: Achieves 2–17 % higher F1‑score than LLM‑only baselines and 8–35 % over traditional parsers, while cutting online parsing latency by ~1,000× compared with per‑log LLM inference.
  • Real‑world validation: Demonstrates effectiveness on public benchmarks (Hadoop, Zookeeper) and a large‑scale industrial system (Sunfire‑Compute), plus case studies in a production environment.

Methodology

  1. Static Code Analyzer – Traverses the codebase, follows logging API calls, and builds a logging context graph that captures the surrounding statements, variable names, and control‑flow information.
  2. White‑box LLM Extractor – Feeds the reconstructed contexts to a pre‑trained LLM (e.g., GPT‑4) prompted to output a template and to label each token as a constant or a variable. Post‑processing cleans up the output (e.g., merging adjacent constants).
  3. Black‑box Data‑driven Extractor – For logs whose source code is unavailable or incomplete, the system falls back to a clustering‑based parser (e.g., Drain) that groups similar log lines and infers templates.
  4. Fusion Layer – Merges the two template sets, resolves conflicts, and builds a final lookup table that can be applied at runtime with negligible latency.

The whole pipeline runs offline (once per release), so the heavy LLM inference happens only during development, not in the production logging pipeline.

Results & Findings

DatasetBaseline (Drain) F1LLM‑only (prompted) F1LLM‑SrcLog F1
Hadoop0.840.860.93 (+9 %)
Zookeeper0.810.830.95 (+12 %)
Sunfire‑Compute (industrial)0.780.800.96 (+16 %)
  • Latency: Average per‑log parsing time drops from ~150 ms (per‑log LLM) to ~0.15 ms, matching traditional parsers.
  • Coverage: >95 % of logs are matched by the code‑driven extractor; the remaining logs are handled by the clustering fallback.
  • Case studies: In a production AIOps pipeline, LLM‑SrcLog reduced false‑positive anomaly alerts by 22 % and cut the time to diagnose a failure from 30 min to under 5 min.

Practical Implications

  • Faster AIOps pipelines – Accurate templates enable downstream anomaly detection and root‑cause analysis tools to work with cleaner, structured data, reducing noise and alert fatigue.
  • Lower operational cost – By moving the expensive LLM inference to the build stage, teams avoid the need for costly GPU resources in production environments.
  • Easier onboarding of new services – Developers can generate log templates automatically as part of CI/CD, ensuring consistency across microservices without manual template maintenance.
  • Improved observability for legacy code – The hybrid design means even systems without source access (e.g., third‑party binaries) still benefit from data‑driven parsing, while any available code is leveraged for higher precision.
  • Security & compliance – Structured logs make it simpler to apply masking, retention policies, and audit trails, which is valuable for regulated industries.

Limitations & Future Work

  • Dependency on logging conventions – The static analyzer assumes a relatively uniform logging API; heavily customized or dynamically generated log statements may be missed.
  • LLM prompt engineering – Quality of extracted templates hinges on prompt design; adapting to different programming languages or frameworks may require additional tuning.
  • Scalability of code analysis – For extremely large monorepos, the static analysis step can become a bottleneck; incremental analysis techniques are suggested as a next step.
  • Extending to multi‑modal logs – Future work includes handling structured logs that embed JSON or protobuf payloads, and integrating feedback loops from runtime parsing errors to refine the template set automatically.

Authors

  • Jiaqi Sun
  • Wei Li
  • Heng Zhang
  • Chutong Ding
  • Shiyou Qian
  • Jian Cao
  • Guangtao Xue

Paper Information

  • arXiv ID: 2512.04474v1
  • Categories: cs.SE
  • Published: December 4, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »