[Paper] Using LLMs to Evaluate Architecture Documents: Results from a Digital Marketplace Environment
Source: arXiv - 2601.19693v1
Overview
Generative AI plays an increasing role during software engineering activities to make them, e.g., more efficient or provide better quality. However, it is often unclear how much benefit LLMs really provide. We concentrate on software architects and investigated how an LLM‑supported evaluation of architecture documents can support software architects to improve such artefacts. In the context of a research project where a digital marketplace is developed and digital solutions should be analyzed, we used different LLMs to analyze the quality of architecture documents and compared the results with evaluations from software architects. We found out that the quality of the artifact has a strong influence on the quality of the LLM, i.e., the better the quality of the architecture document was, the more consistent were the LLM‑based evaluation and the human expert evaluation. While using LLMs in this architecture task is promising, our results showed inconsistencies that need further analyses before generalizing them.
Key Contributions
This paper presents research in the following areas:
- cs.SE
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.SE.
Authors
- Frank Elberzhager
- Matthias Gerbershagen
- Joshua Ginkel
Paper Information
- arXiv ID: 2601.19693v1
- Categories: cs.SE
- Published: January 27, 2026
- PDF: Download PDF