[Paper] Using LLMs to Evaluate Architecture Documents: Results from a Digital Marketplace Environment

Published: 3 months ago (January 27, 2026 at 10:11 AM EST)

2 min read

Source: arXiv

Source: arXiv - 2601.19693v1

Overview

Generative AI plays an increasing role during software engineering activities to make them, e.g., more efficient or provide better quality. However, it is often unclear how much benefit LLMs really provide. We concentrate on software architects and investigated how an LLM‑supported evaluation of architecture documents can support software architects to improve such artefacts. In the context of a research project where a digital marketplace is developed and digital solutions should be analyzed, we used different LLMs to analyze the quality of architecture documents and compared the results with evaluations from software architects. We found out that the quality of the artifact has a strong influence on the quality of the LLM, i.e., the better the quality of the architecture document was, the more consistent were the LLM‑based evaluation and the human expert evaluation. While using LLMs in this architecture task is promising, our results showed inconsistencies that need further analyses before generalizing them.

Key Contributions

This paper presents research in the following areas:

cs.SE

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.SE.

Authors

Frank Elberzhager
Matthias Gerbershagen
Joshua Ginkel

Paper Information

arXiv ID: 2601.19693v1
Categories: cs.SE
Published: January 27, 2026
PDF: Download PDF

[Paper] Using LLMs to Evaluate Architecture Documents: Results from a Digital Marketplace Environment

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Outcome-Conditioned Reasoning Distillation for Resolving Software Issues

[Paper] GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion

[Paper] Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG

[Paper] From Monolith to Microservices: A Comparative Evaluation of Decomposition Frameworks