[Paper] AgenticAKM : Enroute to Agentic Architecture Knowledge Management
Source: arXiv - 2602.04445v1
Overview
The paper AgenticAKM: Enroute to Agentic Architecture Knowledge Management tackles a long‑standing pain point for developers and architects: keeping software architecture documentation up‑to‑date. By orchestrating multiple specialized LLM‑driven “agents” that work together to extract, retrieve, generate, and validate architectural knowledge, the authors demonstrate a practical way to automate the creation of Architecture Decision Records (ADRs) directly from code repositories.
Key Contributions
- Agentic workflow for AKM – Introduces a multi‑agent pipeline (Extraction, Retrieval, Generation, Validation) that decomposes the complex task of architecture recovery into tractable sub‑tasks.
- Prototype for ADR generation – Implements the workflow on real‑world GitHub repositories, automatically producing ADRs that capture design decisions.
- Empirical user study – Evaluates the approach on 29 open‑source projects, showing higher quality ADRs compared with a single‑prompt baseline.
- Open discussion of prompt engineering limits – Highlights why a naïve “one‑prompt‑fits‑all” strategy fails for distributed architectural knowledge.
Methodology
- Problem decomposition – The authors view architecture knowledge management as a series of steps rather than a monolithic query.
- Specialized agents
- Extraction Agent scans the codebase (e.g., build files, configuration, source code) and pulls out low‑level artefacts (components, dependencies, patterns).
- Retrieval Agent searches existing documentation, issue trackers, and commit messages to locate any prior architectural rationale.
- Generation Agent feeds the collected artefacts into an LLM prompt that drafts an ADR, following a standard template (Context, Decision, Status, Consequences).
- Validation Agent runs consistency checks (e.g., does the ADR reference existing code? Are required fields filled?) and asks the LLM to refine the draft if needed.
- Iterative loop – If validation fails, the Generation Agent is invoked again with additional context, mimicking a human reviewer’s back‑and‑forth.
- Implementation – The prototype uses OpenAI’s GPT‑4 API, a simple file‑system crawler, and a vector store for retrieval. The whole pipeline is orchestrated with a lightweight task‑queue.
Results & Findings
- Quality boost – In the user study, 78 % of the ADRs produced by AgenticAKM were rated “useful” or “very useful” by participants, versus 52 % for the single‑prompt baseline.
- Reduced manual effort – Participants reported a 40 % drop in time spent writing ADRs when they could start from the agent‑generated drafts.
- Higher coverage – The multi‑agent system uncovered architectural decisions that were completely missing from the original documentation in 6 of the 29 repositories.
- Prompt length management – By splitting the problem, each LLM call stayed well within token limits, avoiding the truncation issues that plagued the naïve approach.
Practical Implications
- Automated ADR pipelines – Teams can plug AgenticAKM into CI/CD to continuously generate or update ADRs as code evolves, keeping documentation in sync without extra overhead.
- On‑boarding acceleration – New hires get instant, LLM‑generated summaries of key design choices, shortening the learning curve.
- Compliance & audit readiness – Regular, machine‑produced architecture records help satisfy regulatory or internal governance requirements.
- Extensible to other artefacts – The same agentic pattern could be repurposed for generating design docs, API contracts, or migration guides, making it a reusable building block for knowledge automation.
Limitations & Future Work
- LLM hallucination risk – Although the Validation Agent mitigates obvious errors, the system can still produce plausible‑but‑incorrect rationales, especially when source code lacks clear patterns.
- Domain specificity – The prototype was evaluated on open‑source Java/JavaScript projects; performance on legacy codebases, micro‑service ecosystems, or low‑level systems remains untested.
- Scalability of retrieval – The current vector store works for modest repositories; larger monorepos may need more sophisticated indexing and chunking strategies.
- Future directions – The authors plan to (1) integrate static analysis tools for richer artefact extraction, (2) experiment with fine‑tuned LLMs to reduce hallucinations, and (3) broaden the evaluation to industrial settings with stricter security constraints.
Authors
- Rudra Dhar
- Karthik Vaidhyanathan
- Vasudeva Varma
Paper Information
- arXiv ID: 2602.04445v1
- Categories: cs.SE
- Published: February 4, 2026
- PDF: Download PDF