[Paper] Towards a Software Reference Architecture for Natural Language Processing Tools in Requirements Engineering
Source: arXiv - 2602.17498v1
Overview
The paper proposes a shift from building one‑off, monolithic Natural Language Processing (NLP) tools for Requirements Engineering (RE) to creating a modular, reusable ecosystem. By outlining a software reference architecture (SRA) and a concrete research roadmap, the authors aim to make NLP‑for‑RE tools easier to develop, compare, and maintain over the long term.
Key Contributions
- Vision of an interoperable NLP4RE ecosystem – moving away from isolated tools toward reusable components.
- Research roadmap for designing a software reference architecture, following a recognized SRA development methodology.
- Stakeholder‑driven requirements elicitation – a focus‑group session that produced 36 generic system requirements for NLP4RE tools.
- Initial blueprint that identifies core architectural layers (e.g., data ingestion, preprocessing, NLP services, RE‑specific analytics, integration & orchestration).
- Discussion of sustainability challenges such as tool abandonment, lack of benchmarking, and documentation gaps, and how an SRA can mitigate them.
Methodology
- Adopt a standard SRA development framework – the authors follow a well‑established, step‑by‑step process (requirements analysis, architectural design, validation, etc.).
- Stakeholder focus group – RE practitioners, NLP researchers, and tool developers participated in a structured workshop to surface common functional and non‑functional needs.
- Requirement synthesis – the raw input was distilled into 36 high‑level system requirements covering modularity, extensibility, configurability, traceability, and performance.
- Roadmap drafting – based on the requirements, the authors outline short‑, medium‑ and long‑term research activities needed to flesh out the SRA (e.g., defining component interfaces, creating reference implementations, establishing evaluation metrics).
The approach stays practical: rather than proposing a finished architecture, the paper builds a living blueprint that can evolve with community feedback.
Results & Findings
- 36 generic requirements were identified, confirming that current NLP4RE tools share many overlapping needs (e.g., language‑agnostic preprocessing, reusable classification pipelines, traceability to requirements artifacts).
- Clear gaps were highlighted: most existing tools lack standardized APIs, versioning, and a plug‑and‑play model, leading to duplicated effort.
- Feasibility of an SRA – the stakeholder consensus suggests that a modular reference architecture would be welcomed and could serve as a common “ lingua franca ” for tool developers.
Practical Implications
- Faster prototyping – developers can assemble a new NLP4RE solution by wiring together existing modules (e.g., a tokeniser, a domain‑specific classifier, a validation engine) instead of coding everything from scratch.
- Easier benchmarking – standardized interfaces enable fair, reproducible comparisons across different NLP techniques applied to the same RE tasks.
- Reduced maintenance burden – modular components can be updated independently, extending the lifespan of tools and lowering the risk of abandonment after a conference paper is published.
- Integration with DevOps pipelines – an SRA that exposes RESTful or gRPC APIs fits naturally into CI/CD workflows, allowing automated requirement analysis as part of continuous delivery.
- Community‑driven ecosystem – open‑source repositories can host reusable modules, encouraging contributions, shared best practices, and a marketplace of plug‑ins for specific domains (e.g., automotive, medical).
Limitations & Future Work
- Scope limited to requirement gathering – the paper stops at the requirements and roadmap stage; concrete reference implementations are still pending.
- Stakeholder sample size – the focus group involved a relatively small, possibly domain‑biased set of participants, which may not capture all industry nuances.
- Tooling heterogeneity – reconciling diverse programming languages, data formats, and legacy systems will be a non‑trivial engineering challenge.
- Future work includes building prototype modules, validating the architecture on real‑world RE projects, and establishing governance mechanisms for the ecosystem (e.g., versioning policies, security standards).
Authors
- Julian Frattini
- Quim Motger
Paper Information
- arXiv ID: 2602.17498v1
- Categories: cs.SE
- Published: February 19, 2026
- PDF: Download PDF