[Paper] RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback

Published: 3 weeks ago (January 16, 2026 at 10:18 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.11362v1

Overview

The paper introduces RITA, an open‑source tool that stitches together several lightweight large language models (LLMs) to turn noisy, high‑volume online user feedback into clean, actionable software requirements. By providing an end‑to‑end workflow—from classification of feedback items to generation of formal requirement specifications and direct export to Jira—RITA aims to make requirements engineering (RE) practical for modern development teams that already live in a feedback‑rich ecosystem.

Key Contributions

Unified RE pipeline that combines three LLM‑driven tasks (request classification, non‑functional requirement (NFR) detection, and natural‑language specification generation) into a single, easy‑to‑use interface.
Lightweight, open‑source LLM integration (e.g., distilled versions of GPT‑2/3‑like models) that run locally or on modest cloud resources, lowering the barrier to adoption.
Bidirectional Jira integration, allowing automatically generated requirement tickets to be pushed directly into existing agile workflows.
Demonstrated usability through a short video demo and a prototype web UI that lets product managers and developers explore the tool without any RE expertise.
Empirical grounding: each LLM component builds on previously validated RE techniques, showing that research‑grade models can be repurposed for production‑grade tooling.

Methodology

Data Ingestion – RITA pulls raw feedback from public sources (e.g., app store reviews, GitHub issues, community forums) via simple connectors or CSV uploads.
Pre‑processing – Text is cleaned, language‑detected, and tokenized. A lightweight transformer model then produces sentence‑level embeddings.
Request Classification – A fine‑tuned classification model (binary “feature request” vs. “bug report” vs. “other”) tags each item.
NFR Identification – A second model scans the classified requests for quality attributes (performance, security, usability, etc.) using a multi‑label approach.
Specification Generation – Using a prompt‑engineered generative LLM, RITA rewrites each request into a structured requirement template (e.g., “As a , I want so that ”).
Export to Jira – The generated specs are mapped to Jira issue fields (summary, description, labels) and pushed via the Jira REST API.

All steps are orchestrated through a Flask‑based web UI, with optional Docker deployment for reproducibility.

Results & Findings

Classification Accuracy: 92 % macro‑F1 on a manually labeled test set of 1,200 feedback items (≈ 5 % improvement over baseline keyword filters).
NFR Detection: Multi‑label F1‑score of 0.84 across six NFR categories, confirming that lightweight models can capture nuanced quality concerns.
Specification Quality: Human evaluators rated 78 % of generated requirements as “ready for review” (i.e., needing only minor edits), compared to 45 % for a generic GPT‑3 baseline.
End‑to‑End Throughput: Processing 10 k feedback entries took under 7 minutes on a single GPU‑enabled VM, demonstrating scalability for typical product teams.

Practical Implications

Speed up backlog grooming – Teams can automatically surface high‑value feature requests and bugs, reducing manual triage time.
Consistent requirement language – By enforcing a template, RITA helps maintain a uniform style across tickets, easing downstream design and testing.
Integrates with existing toolchains – Direct Jira export means no disruption to agile pipelines; developers can start working on AI‑generated tickets immediately.
Cost‑effective RE – Using distilled LLMs keeps compute costs low (≈ $0.02 per 1 k tokens), making the solution viable for startups and mid‑size enterprises.
Feedback‑driven product roadmaps – Product managers can query the classification and NFR layers to spot trends (e.g., rising security concerns) and adjust priorities accordingly.

Limitations & Future Work

Domain Generality – The models were trained on generic app‑store data; performance may drop for highly specialized domains (e.g., medical devices) without additional fine‑tuning.
Explainability – While the UI shows confidence scores, the underlying LLM decisions remain a black box, which could hinder trust for safety‑critical requirements.
Multilingual Support – Current pipelines handle only English feedback; extending to other languages will require multilingual embeddings and prompts.
User Study – The paper reports a small‑scale human evaluation; larger longitudinal studies are needed to quantify impact on development velocity and defect rates.
Continuous Learning – Future versions could incorporate active learning loops where developers correct misclassifications, feeding the updates back into the models for on‑the‑fly improvement.

Authors

Manjeshwar Aniruddh Mallya
Alessio Ferrari
Mohammad Amin Zadenoori
Jacek Dąbrowski

Paper Information

arXiv ID: 2601.11362v1
Categories: cs.SE
Published: January 16, 2026
PDF: Download PDF

[Paper] RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Applying Formal Methods Tools to an Electronic Warfare Codebase (Experience report)

[Paper] A Practical Guide to Establishing Technical Debt Management

[Paper] Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective

[Paper] Patterns of Bot Participation and Emotional Influence in Open-Source Development