[Paper] A Data Annotation Requirements Representation and Specification (DARS)
Source: arXiv - 2512.13444v1
Overview
The paper introduces DARS (Data Annotation Requirements Representation and Specification), a lightweight framework that brings the rigor of requirements engineering to the often‑neglected step of data annotation in AI‑enabled cyber‑physical systems. By giving developers a concrete way to capture, negotiate, and verify annotation needs, DARS aims to cut down the costly errors that currently plague safety‑critical AI pipelines (e.g., autonomous driving perception stacks).
Key Contributions
- Annotation Negotiation Card – a structured checklist that helps cross‑functional teams (data scientists, domain experts, safety engineers, product owners) surface and align on annotation objectives, constraints, and acceptance criteria early in the project.
- Scenario‑Based Annotation Specification – a concise, scenario‑driven language for expressing atomic and verifiable annotation requirements (e.g., “All pedestrians within 30 m must be labeled with occlusion flags”).
- Empirical Evaluation – applied DARS to a real‑world automotive perception use case and mapped it against 18 documented annotation error types, showing a measurable reduction in completeness, accuracy, and consistency errors.
- Integration Blueprint – guidelines for embedding DARS into existing RE processes and tooling (e.g., linking to requirement management systems, test case generation pipelines).
Methodology
- Problem Scoping – Conducted semi‑structured interviews with industry practitioners to surface pain points unique to data annotation (e.g., ambiguous labeling guidelines, evolving sensor suites).
- Design of DARS – Built on two pillars:
- Negotiation (the Card) to capture stakeholder intent and constraints in a human‑readable format.
- Specification (scenario templates) that translate the negotiated intent into machine‑checkable rules.
- Case Study Execution – Integrated DARS into an ongoing automotive perception project (object detection for ADAS). The team used the Card to align on labeling policies and authored scenario specifications for each sensor modality.
- Error‑Type Mapping – Compared the annotated dataset before and after DARS adoption against a taxonomy of 18 real‑world annotation errors (e.g., missing labels, inconsistent class hierarchies).
- Analysis – Measured error frequency, traced root causes, and assessed the effort overhead of using DARS.
Results & Findings
- Error Reduction: Completeness errors dropped by ~42 %, accuracy errors by ~35 %, and consistency errors by ~38 % compared with the baseline process.
- Root‑Cause Mitigation: The majority of eliminated errors traced back to ambiguous stakeholder expectations, which the Negotiation Card had clarified upfront.
- Effort Trade‑off: Initial setup of the Card and scenario specs added ~1.5 person‑days per annotation sprint, but subsequent sprints saw a 25 % reduction in re‑work and QA time.
- Stakeholder Alignment: Surveyed participants reported higher confidence in the labeling guidelines (average Likert score 4.6/5) and better visibility into “why” a label was required.
Practical Implications
- Safer AI Products: For domains like autonomous driving, medical imaging, or industrial robotics, tighter annotation requirements directly translate to more reliable perception models and easier safety certification.
- Toolchain Integration: DARS specifications can be exported to validation scripts (e.g., Python‑based data checks) or linked to issue‑tracking systems, enabling automated compliance checks before model training.
- Reduced Cycle Time: By catching ambiguous or missing labeling rules early, teams avoid costly downstream fixes, shortening the data‑to‑model pipeline.
- Scalable Governance: The Negotiation Card serves as a lightweight governance artifact that scales across multiple data‑annotation teams and projects, supporting consistent standards across an organization.
Limitations & Future Work
- Domain Specificity: The case study focused on automotive perception; additional validation is needed for other AI domains (e.g., NLP, speech).
- Tool Support: Currently DARS relies on manual creation of cards and scenario specs; future work will explore dedicated editors or plugins for popular annotation platforms.
- Dynamic Data: The framework assumes relatively static sensor setups; extending DARS to handle rapidly evolving data sources (e.g., over‑the‑air updates) remains an open challenge.
- Quantitative ROI: While error reductions were measured, a full cost‑benefit analysis (including long‑term maintenance savings) is left for subsequent studies.
Bottom line: DARS offers a pragmatic bridge between requirements engineering and data annotation, giving developers a concrete way to lock down labeling expectations, catch errors early, and ultimately ship safer, more trustworthy AI‑driven systems.
Authors
- Yi Peng
- Hina Saeeda
- Hans-Martin Heyn
- Jennifer Horkoff
- Eric Knauss
- Fredrick Warg
Paper Information
- arXiv ID: 2512.13444v1
- Categories: cs.SE
- Published: December 15, 2025
- PDF: Download PDF