[Paper] Evaluating Design Conformance Through Trace Comparison
Source: arXiv - 2605.07909v1
Overview
Design and implementation often happen at different times and by different people, which can cause a system to drift away from its original architectural intent. Anderson and Reza propose a lightweight, quantitative way to keep implementations honest to their designs by comparing OpenTelemetry traces from a running system against “design traces” derived from the intended model.
Key Contributions
- Trace‑based conformance metric – Introduces a concrete percentage that measures how closely live execution traces match the design specification.
- Adaptation of process‑mining techniques – Re‑uses well‑established conformance checking methods from process mining for the domain of distributed systems.
- OpenTelemetry‑centric pipeline – Builds the entire workflow on the industry‑standard OpenTelemetry ecosystem, ensuring broad applicability without custom instrumentation.
- Long‑term drift detection – Shows how the metric can be tracked over weeks or months to surface gradual design erosion before it becomes a reliability problem.
- Prototype implementation & case study – Provides an open‑source prototype and validates it on a microservice‑based e‑commerce demo, demonstrating practical feasibility.
Methodology
- Design Trace Generation – The authors start from a high‑level architectural model (e.g., a sequence diagram or a BPMN process) and automatically emit a set of design traces that represent the ideal ordering of spans and events.
- Instrumentation with OpenTelemetry – The target application is instrumented using standard OpenTelemetry SDKs, producing runtime traces that capture spans, timestamps, and attributes across service boundaries.
- Alignment & Conformance Checking – Using a variant of the “alignment” algorithm from process mining, each runtime trace is aligned to the closest design trace. Mismatches (missing, extra, or reordered spans) are penalized, and a conformance score (0‑100 %) is computed per trace.
- Aggregation & Trend Analysis – Scores are aggregated across requests and visualized over time, allowing teams to spot regressions or steady drift.
- Tooling – The prototype consists of three components: a trace exporter (OpenTelemetry Collector), an alignment engine (Python/Java), and a dashboard (Grafana) that displays the conformance metric alongside traditional performance KPIs.
Results & Findings
- High‑fidelity detection – In the e‑commerce case study, the system flagged a 12 % drop in conformance after a refactor that unintentionally introduced an extra asynchronous call, a change that was invisible to standard latency metrics.
- Low overhead – Adding OpenTelemetry instrumentation increased average request latency by < 2 % and added ~5 KB of trace data per request, well within typical production budgets.
- Scalability – The alignment engine processed > 10 k traces per minute on a single commodity node, suggesting feasibility for medium‑scale microservice fleets.
- Actionable insights – Teams could map low‑scoring traces back to specific code paths, enabling targeted code reviews and design updates.
Practical Implications
- Continuous Design Validation – Developers can embed conformance checks into CI/CD pipelines, catching design violations before they ship.
- Technical Debt Monitoring – The metric serves as a quantitative “design debt” indicator, complementing code coverage and static analysis tools.
- On‑boarding Aid – New team members can quickly verify that their changes respect the intended architecture, reducing knowledge‑transfer friction.
- Compliance & Auditing – For regulated industries (e.g., finance, healthcare), the conformance percentage provides audit‑ready evidence that implementations adhere to documented processes.
- Vendor‑agnostic Observability – Because the approach relies on OpenTelemetry, it works across cloud providers, language runtimes, and existing observability stacks without vendor lock‑in.
Limitations & Future Work
- Design Trace Fidelity – The approach assumes the design model can be expressed as a traceable sequence; highly dynamic or data‑dependent flows may be hard to capture.
- False Positives from Non‑functional Variations – Load‑balancing retries or circuit‑breaker patterns can generate extra spans that the current alignment algorithm treats as violations.
- Scalability to Massive Deployments – While the prototype handles thousands of traces per minute, ultra‑large systems (millions of requests/sec) will need distributed alignment or sampling strategies.
- User‑defined Tolerance – Future work should let teams specify acceptable deviations (e.g., optional spans) to reduce noise.
- Integration with Existing SLO/SLI Frameworks – The authors plan to expose conformance as a first‑class SLI, enabling automated alerting and budgeting alongside latency and error‑rate SLOs.
Bottom line: By turning design adherence into a measurable, observable metric, this work gives developers a practical tool to keep fast‑moving codebases aligned with their original architectural vision—without reinventing the wheel of instrumentation.
Authors
- Reid Anderson
- Hassan Reza
Paper Information
- arXiv ID: 2605.07909v1
- Categories: cs.SE
- Published: May 8, 2026
- PDF: Download PDF