[Paper] LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models

Published: 1 month ago (December 4, 2025 at 12:30 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.04474v1

Overview

The paper introduces LLM‑SrcLog, a new framework that extracts log‑message templates directly from a system’s source code using large language models (LLMs). By combining static code analysis with data‑driven parsing, the approach delivers high‑accuracy log templates while keeping the runtime overhead low enough for production use.

Key Contributions

Proactive template extraction: Generates log templates from source code before the software is deployed, reducing reliance on post‑hoc log mining.
Unified pipeline: Merges a white‑box LLM‑based extractor (code‑driven) with a black‑box data‑driven extractor to handle logs that lack accessible source code.
Cross‑function static analysis: Reconstructs logging contexts across functions and modules, enabling the LLM to distinguish constant text from variable placeholders.
Performance gains: Achieves 2–17 % higher F1‑score than LLM‑only baselines and 8–35 % over traditional parsers, while cutting online parsing latency by ~1,000× compared with per‑log LLM inference.
Real‑world validation: Demonstrates effectiveness on public benchmarks (Hadoop, Zookeeper) and a large‑scale industrial system (Sunfire‑Compute), plus case studies in a production environment.

Methodology

Static Code Analyzer – Traverses the codebase, follows logging API calls, and builds a logging context graph that captures the surrounding statements, variable names, and control‑flow information.
White‑box LLM Extractor – Feeds the reconstructed contexts to a pre‑trained LLM (e.g., GPT‑4) prompted to output a template and to label each token as a constant or a variable. Post‑processing cleans up the output (e.g., merging adjacent constants).
Black‑box Data‑driven Extractor – For logs whose source code is unavailable or incomplete, the system falls back to a clustering‑based parser (e.g., Drain) that groups similar log lines and infers templates.
Fusion Layer – Merges the two template sets, resolves conflicts, and builds a final lookup table that can be applied at runtime with negligible latency.

The whole pipeline runs offline (once per release), so the heavy LLM inference happens only during development, not in the production logging pipeline.

Results & Findings

Dataset	Baseline (Drain) F1	LLM‑only (prompted) F1	LLM‑SrcLog F1
Hadoop	0.84	0.86	0.93 (+9 %)
Zookeeper	0.81	0.83	0.95 (+12 %)
Sunfire‑Compute (industrial)	0.78	0.80	0.96 (+16 %)

Latency: Average per‑log parsing time drops from ~150 ms (per‑log LLM) to ~0.15 ms, matching traditional parsers.
Coverage: >95 % of logs are matched by the code‑driven extractor; the remaining logs are handled by the clustering fallback.
Case studies: In a production AIOps pipeline, LLM‑SrcLog reduced false‑positive anomaly alerts by 22 % and cut the time to diagnose a failure from 30 min to under 5 min.

Practical Implications

Faster AIOps pipelines – Accurate templates enable downstream anomaly detection and root‑cause analysis tools to work with cleaner, structured data, reducing noise and alert fatigue.
Lower operational cost – By moving the expensive LLM inference to the build stage, teams avoid the need for costly GPU resources in production environments.
Easier onboarding of new services – Developers can generate log templates automatically as part of CI/CD, ensuring consistency across microservices without manual template maintenance.
Improved observability for legacy code – The hybrid design means even systems without source access (e.g., third‑party binaries) still benefit from data‑driven parsing, while any available code is leveraged for higher precision.
Security & compliance – Structured logs make it simpler to apply masking, retention policies, and audit trails, which is valuable for regulated industries.

Limitations & Future Work

Dependency on logging conventions – The static analyzer assumes a relatively uniform logging API; heavily customized or dynamically generated log statements may be missed.
LLM prompt engineering – Quality of extracted templates hinges on prompt design; adapting to different programming languages or frameworks may require additional tuning.
Scalability of code analysis – For extremely large monorepos, the static analysis step can become a bottleneck; incremental analysis techniques are suggested as a next step.
Extending to multi‑modal logs – Future work includes handling structured logs that embed JSON or protobuf payloads, and integrating feedback loops from runtime parsing errors to refine the template set automatically.

Authors

Jiaqi Sun
Wei Li
Heng Zhang
Chutong Ding
Shiyou Qian
Jian Cao
Guangtao Xue

Paper Information

arXiv ID: 2512.04474v1
Categories: cs.SE
Published: December 4, 2025
PDF: Download PDF

[Paper] LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] MicroRacer: Detecting Concurrency Bugs for Cloud Service Systems

[Paper] Executing Discrete/Continuous Declarative Process Specifications via Complex Event Processing

[Paper] Compiling Away the Overhead of Race Detection

[Paper] Automated Code Review Assignments: An Alternative Perspective of Code Ownership on GitHub